admin管理员组

文章数量:1530085

注:机翻,未校。


History of Data Science

By Leo Smigel

Updated on October 13, 2023

Peter Naur first termed Data Science in the year 1974. But, the journey had begun in ancient times. Here’s a chronicle about the history of data science, tracing its path from the formative collection methods to the most advanced forms of data processing.
Peter Naur 于 1974 年首次将数据科学称为数据科学。但是,这段旅程始于远古时代。这是一部关于数据科学历史的编年史,追溯了它从形成性收集方法到最先进的数据处理形式的路径。

Data Science has taken the world by storm. It is not a single subject, but an all-encompassing term including programming, data mining, statistics, data visualization, analytics, and business intelligence.
数据科学风靡全球。它不是一个单一的主题,而是一个包罗万象的术语,包括编程、数据挖掘、统计、数据可视化、分析和商业智能。

Data Science is the complete process of collating huge data sets, managing them and deriving insights for several productive purposes. The field of Data Science is constantly evolving to keep pace with the changing technology and business practices.
数据科学是整理大量数据集、管理它们并为多种生产目的获得见解的完整过程。数据科学领域不断发展,以跟上不断变化的技术和业务实践。

Statistical data is the driving force behind the development of science, accounting, logistics, and other businesses. Data science as we know in the current times has a brief history. But, data collection at a massive scale and its analysis have existed since ancient times. Librarians, Scientists, statisticians, and demographers have discussed and worked with huge datasets for years.
统计数据是科学、会计、物流和其他业务发展的驱动力。正如我们在当今时代所知道的,数据科学有一个简短的历史。但是,大规模的数据收集及其分析自古以来就存在。多年来,图书馆员、科学家、统计学家和人口统计学家一直在讨论和处理庞大的数据集。

Today, data analysis and extracting insights from it has emerged as the most coveted and intriguing task. It has even led to a new professional role in the form of a Data Scientist. Famous for his work in Big Data, American journalist, Kenneth Cukier, said, data scientists “combine the skills of software programmer, statistician, and storyteller/artist to extract the nuggets of gold hidden under mountains of data”.32
今天,数据分析并从中提取见解已成为最令人垂涎和有趣的任务。它甚至导致了数据科学家形式的新专业角色。美国记者 Kenneth Cukier 以其在大数据方面的工作而闻名,他说,数据科学家“结合了软件程序员、统计学家和讲故事的人/艺术家的技能,以提取隐藏在数据山下的金块”32。

In this article, we recollect the history of data science with its many landmark events.
在本文中,我们回顾了数据科学的历史及其许多里程碑式的事件。

1663: John Graunt’s Extensive Demographic Data Collection 1663 年:约翰·格朗特 (John Graunt) 的广泛人口统计数据收集

In 1663, John Graunt, a British demographer, recorded and analyzed every piece of information about mortality rates in London.1 Graunt’s objective was to build an effective warning system for the bubonic plague epidemic. John used the Rule of Three and used ratios by comparing years in the Bills of Mortality to estimate the population size of London and England, the birth and mortality rates of males and females, and the rise and spread of particular diseases. Graunt is also known as the ‘Father of Demographics.’2
1663 年,英国人口统计学家约翰·格朗特 (John Graunt) 记录并分析了有关伦敦死亡率的每一条信息。格朗特的目标是建立一个有效的腺鼠疫流行预警系统。John 使用三法则,并通过比较死亡率法案中的年份来使用比率来估计伦敦和英格兰的人口规模、男性和女性的出生率和死亡率,以及特定疾病的兴起和传播。Graunt 也被称为“人口统计学之父”2。


During his first attempt at statistical data analysis, Graunt noted all his observations and findings in the book Natural and Political Observations Made upon the Bills of Mortality. This book was compiled based on data collected by John Graunt and offers a detailed account of the causes of death in the 17th century.
在他第一次尝试进行统计数据分析时,格劳特在《对死亡账单的自然和政治观察》一书中记录了他的所有观察和发现。这本书是根据约翰·格劳特 (John Graunt) 收集的数据编写的,详细介绍了 17 世纪的死因。

1763: Bayes Theorem 1763 年:贝叶斯定理

Published posthumously in 1763, Thomas Bayes’ theorem of conditional probability is one of the cornerstones of Data Science.3 This conditional probability is known as a hypothesis. This hypothesis is calculated through previous evidence or knowledge. Bayes’ theorem aims to revise existing predictions or theories (update probabilities) and offers additional evidence. This conditional probability is the possibility of an event if some other event has already happened.
托马斯·贝叶斯 (Thomas Bayes) 的条件概率定理于 1763 年在他去世后出版,是数据科学的基石之一3,这种条件概率被称为假设。这个假设是通过先前的证据或知识计算得出的。贝叶斯定理旨在修正现有的预测或理论(更新概率)并提供额外的证据。这个条件概率是如果其他事件已经发生,则事件的可能性。

1840: Ada Lovelace: The First Computer Programmer 1840 年:Ada Lovelace:第一位计算机程序员

Programming is critical to Data Science, and the person who pioneered it in the 17th century was Ada Lovelace, an English noblewoman. Ada Lovelace was an associate of Charles Babbage, the “father of computers.” Lovelace worked with Babbage on the “Difference Engine,” a mechanical calculator.4
编程对数据科学至关重要,17 世纪开创编程的人是英国贵妇 Ada Lovelace。Ada Lovelace 是“计算机之父”Charles Babbage 的同事。洛夫莱斯与巴贝奇合作开发了“差分引擎”,这是一种机械计算器4。

In 1840, Ada Lovelace was working on a translation project for a paper written by an Italian engineer, Luigi Manabrea. The article was from the book: “Sketch of the Analytical Engine Invented by Charles Babbage, Esq” published in French. However, she went far beyond translating it. She included extensive notes in the paper, including a few of her theories and her analysis of the extensive ones.
1840 年,Ada Lovelace 正在为意大利工程师 Luigi Manabrea 撰写的一篇论文进行翻译项目。这篇文章出自法文出版的《查尔斯·巴贝奇发明的分析引擎素描》一书。然而,她远远超出了翻译它。她在论文中包括了大量的注释,包括她的一些理论和她对广泛理论的分析。

Ada Lovelace illustration

In August 1843, the translated work was published in Taylor’s Scientific Memoirs and its final appendix, Note G, became extremely famous. In the paper, Lovelace proposed an algorithm for the engine for computing Bernoulli’s numbers. These are a complex series of rational numbers frequently used in arithmetic and computation.
1843 年 8 月,翻译后的作品发表在泰勒的《科学回忆录》中,其最后的附录 Note G 非常有名。在论文中,Lovelace 为引擎提出了一种用于计算伯努利数的算法。这些是一系列复杂的有理数,经常用于算术和计算。

This is the first instance of computer programming, which happened even before people thought the modern computer would ever be invented. Ursula Martin, an Ada Lovelace biographer and professor of computer science at the University of Oxford, said, “She’s written a program to calculate some rather complicated numbers — Bernoulli numbers… This shows off what complicated things the computer could have done.”5
这是计算机编程的第一个实例,甚至在人们认为现代计算机将被发明之前就发生了。Ada Lovelace 的传记作者、牛津大学计算机科学教授 Ursula Martin 说:“她编写了一个程序来计算一些相当复杂的数字——伯努利数…这展示了计算机可以完成多么复杂的事情。5

Though Ada Lovelace’s algorithm is not directly related to Data Science, she was the first to lay the foundation of programming. Without this significant leap, Data Science would have been impossible to imagine.
虽然 Ada Lovelace 的算法与数据科学没有直接关系,但她是第一个奠定编程基础的人。如果没有这一重大飞跃,数据科学是无法想象的。

1855: Florence Nightingale, the Victorian Medical Reformer, Used Data Visualization 1855 年:维多利亚时代的医学改革者弗洛伦斯·南丁格尔 (Florence Nightingale) 使用数据可视化

Florence Nightingale was a Victorian Icon also known as a founder of modern nursing. She was known as the pioneer of using statistics and data visualization to analyze the spread of infectious diseases.6
弗洛伦斯·南丁格尔 (Florence Nightingale) 是维多利亚时代的偶像,也被称为现代护理学的创始人。她被称为使用统计和数据可视化来分析传染病传播的先驱6。

Today, we can put up a fight against a pandemic thanks to the practical information systems set up by countries across the world. But in the 17th century, such a system was unheard of. According to statistics historian Eileen Magnello of University College London, Nightingale’s diagram, Rose is a variation of a pie chart or a polar area chart. Through the diagram, she showed that poor sanitation, and not battle wounds, were responsible for the death of the English soldiers during the Crimean War in the 1850s. She also stated that such deaths were avoidable. Nightingale used data that she and her staff collected during her duty in the camps and hospital.
今天,由于世界各国建立的实用信息系统,我们可以抗击大流行病。但在 17 世纪,这样的系统是闻所未闻的。根据伦敦大学学院的统计历史学家艾琳·马格内洛 (Eileen Magnello) 南丁格尔图,罗斯是饼图或极地面积图的变体。通过这张图,她表明,在 1850 年代克里米亚战争期间,英国士兵的死亡是恶劣的卫生条件,而不是战伤。她还表示,此类死亡是可以避免的。南丁格尔使用了她和她的工作人员在营地和医院值班期间收集的数据。

Nightingale’s famous data visualization shows English soldiers dying of cholera and preventable diseases vs. battle wounds during the Crimean War.

Nightingale also made a series of other charts to convince the authorities about the importance of sanitation. Visualizations were one of Nightingale’s preferred ways of communicating. She said, “Whenever I am infuriated, I revenge myself with a new diagram.”
南丁格尔还制作了一系列其他图表,以说服当局卫生的重要性。可视化是 Nightingale 首选的交流方式之一。她说:“每当我生气时,我都会用一张新图表来报复自己。

Eventually, Nightingale’s ideas started getting acknowledged, and the sanitation needs for the Patients at military and civilian hospitals were taken care of.
最终,南丁格尔的想法开始得到认可,军队和民用医院的病人的卫生需求得到了照顾。

1865: The Term Business Intelligence Is Coined 1865 年:创造了商业智能一词

In 1865, Richard Miller Devens, an American historian, and author, first used the phrase “Business Intelligence” (BI) in his work, Cyclopædia of Commercial and Business Anecdotes. Today, we know business intelligence as analyzing data and creating actionable information to solve several business problems.78
1865 年,美国历史学家兼作家理查德·米勒·德文斯 (Richard Miller Devens) 在他的著作《商业和商业轶事百科全书》中首次使用了“商业智能”(BI) 一词。今天,我们知道商业智能可以分析数据和创建可操作的信息来解决多个业务问题78。

Devans used it to describe how Sir Henry Furnese, an English banker, earned massive profits from information by gathering data from various sources and acting on it to outdo his competitors.
Devans 用它来描述英国银行家亨利·弗内塞爵士 (Sir Henry Furnese) 如何通过从各种来源收集数据并据此采取行动以超越竞争对手,从信息中赚取巨额利润。

He stated, “Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news of the many battles fought was thus received first by him, and the fall of Namur added to his profits, owing to his early receipt of the news.”
他说:“在整个荷兰、佛兰德斯、法国和德国,他保持着一整套完整的商业情报。因此,他首先收到了许多战斗的消息,而那慕尔的沦陷增加了他的利润,因为他很早就收到了这个消息。

1884: Hollerith Marks the Beginning of Data Processing 1884 年:Hollerith 标志着数据处理的开始

In 1884, Herman Hollerith, an American inventor, and statistician invented the punch card tabulating machine, which marked the beginning of data processing. Hollerith is also known as the Father of Modern Automatic Computing.9
1884 年,美国发明家和统计学家赫尔曼·霍勒里斯 (Herman Hollerith) 发明了穿孔卡制表机,这标志着数据处理的开始。Hollerith 也被称为现代自动计算之父9。

This tabulating device that Hollerith developed was later used to process the 1890 US Census data. Later, in 1911, he founded the Computing-Tabulating-Recording Company, which became International Business Machine or IBM.
Hollerith 开发的这种制表设备后来用于处理 1890 年的美国人口普查数据。后来,在 1911 年,他创立了 Computing-Tabulating-Recording Company,该公司后来成为 International Business Machine 或 IBM。

U.S. Bureau of the Census computer operator at a punch card sorter

1936: Alan Turing Introduced ‘Computable Numbers’ 1936 年:艾伦·图灵 (Alan Turing) 推出了“可计算数”

In 1936, Alan Turing’s paper, On Computable Numbers, introduced Universal Machine performing complex computations like our modern-day computers.10 The paper propagated the mathematical description of a hypothetical computing device that could mimic the ability of the human mind to manipulate symbols. It won’t be wrong to say that Turing has pioneered modern-day computing through his path-breaking concepts.
1936 年,艾伦·图灵 (Alan Turing) 的论文《论可计算数》(On Computable Numbers) 介绍了通用机器(Universal Machine)可以像现代计算机一样执行复杂的计算10。该论文传播了一种假设计算设备的数学描述,该设备可以模仿人类大脑操纵符号的能力。可以说,图灵通过他的开创性概念开创了现代计算。

An enigma machine on display outside the Alan Turing Institute inside the British Library, London

According to Turing, “computable numbers” are the ones that a definite rule can define and calculated on the universal machine.11 He also stated that these computable numbers “would include every number that could be arrived at through arithmetical operations, finding roots of equations, and using mathematical functions like sines and logarithms—every number that could arise in computational mathematics.”
根据图灵的说法,“可计算数字”是确定规则可以在通用机器上定义和计算的数字。11 他还指出,这些可计算数字“将包括通过算术运算、查找方程根以及使用正弦和对数等数学函数可以得出的每个数字——计算数学中可能出现的每个数字。

1937: IBM Gets Social Security Contract 1937 年:IBM 签订社会保障合同

Franklin D. Roosevelt’s administration in the USA commissioned the first significant data project in 1937. This happened after the Social Security Act became law in 1935.12 The government had undertaken a massive bookkeeping project to track payroll contributions from 26 million Americans and over 3 million employers. Ultimately, IBM received the contract to develop a punch card-reading machine for this project called IBM Type 77 collators.13
1937 年,美国富兰克林·罗斯福 (Franklin D. Roosevelt) 政府委托开展了第一个重要数据项目。这发生在 1935 年《社会保障法》成为法律之后12。政府开展了一项大规模的簿记项目,以跟踪 2600 万美国人和超过 300 万雇主的工资缴款。最终,IBM 获得了为该项目开发一款名为 IBM Type 77 收集卡机的穿孔读卡机的合同13。

News Article, Sunday News, January 10, 1937

These collators could work with two sets of punch cards, compare them and then merge them into a single pile. The machine was efficient enough to handle nearly 480 cards per minute.
这些整理者可以使用两套穿孔卡,比较它们,然后将它们合并为一堆。这台机器的效率足以每分钟处理近 480 张卡片。

Collators emerged as the fastest way of combining data sets or identifying duplicate cards. So strong was the impact of the device that the 80-column punch cards that those IBM Type 77 collators used became an industry standard for the next 45 years.
收集人成为组合数据集或识别重复卡片的最快方式。该设备的影响如此之大,以至于 IBM Type 77 整理者使用的 80 列穿孔卡成为接下来 45 年的行业标准。

1943: The First Data Processing Machines 1943 年:第一台数据处理机

In 1943, Tommy Flowers, the Post Office electronics engineer from the UK, designed a theoretical computer, Colossus.14 It was one of the first data processing machines to interpret Nazi codes during WWII. The Colossus could perform Boolean operations as well as computations to analyze humongous data sets.15
1943 年,来自英国的邮局电子工程师 Tommy Flowers 设计了一台理论计算机 Colossus。14 它是二战期间最早用于解释纳粹代码的数据处理机器之一。Colossus 可以执行布尔运算和计算来分析庞大的数据集15。

This revolutionary device looked for patterns in intercepted messages at a rate of 5,000 characters per second, reducing the execution time from weeks to just a few hours.
这种革命性的设备以每秒 5,000 个字符的速度在拦截的消息中寻找模式,将执行时间从数周缩短到仅几个小时。

Tommy Flowers made a significant breakthrough by proposing that wheel patterns can be generated electronically in ring circuits. This employed removing one paper tape and completely doing away with the synchronization problem.
Tommy Flowers 提出了可以在环形电路中以电子方式生成轮图案,从而取得了重大突破。这采用了删除一个纸带并完全消除了同步问题。

1962: John Tukey Projected the Impact of Electronic Computing on Data Analysis 1962 年:John Tukey 预测了电子计算对数据分析的影响

In 1962 John W. Tukey projected the impact of present-day electronic computing on data analysis.16 Tukey was a chemist-turned statistician who contributed mainly to statistics during the 1900s. He also pioneered a significant research project to study various graphical methods for data analysis. The invention of Box-and-Whisker Plot, the Stem-and-Leaf Diagram, and Tukey’s Paired Comparisons are three of Tukey’s most prized contributions to statistics.
1962 年,John W. Tukey 预测了当今电子计算对数据分析的影响。16 Tukey 是一位化学家出身的统计学家,在 1900 年代主要为统计学做出贡献。他还开创了一个重要的研究项目,研究用于数据分析的各种图形方法。Box-and-Whisker Plot、Stem-and-Leaf Diagram 和 Tukey 的配对比较的发明是 Tukey 对统计学最有价值的三项贡献。

John W. Tukey also authored “The Future of Data Analysis” in 1962, which was the first time in history that data science was globally recognized. Interestingly, Tukey introduced the term “bit” as a contraction of “binary digit.” In the book “Annals of the History of Computers” Tukey is credited as the person behind the word “bit,” a contraction of “binary digit,” the term describing the 1s and 0s that are the basis of computer programs.
John W. Tukey 还在 1962 年撰写了《数据分析的未来》,这是数据科学历史上第一次得到全球认可。有趣的是,Tukey 引入了术语“bit”作为“binary digit”的缩写。在《计算机史年鉴》(Annals of the History of Computers)一书中,图基被认为是“位”一词的幕后推手,“位”是“二进制数字”的缩写,该术语描述了作为计算机程序基础的 1 和 0。

1974: Peter Naur Analyzes Contemporary Data Processing 1974 年:Peter Naur 分析当代数据处理

In 1974, Peter Naur defined the term “data science” as “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”17 He published the book Concise Survey of Computer Methods in Sweden and the United States, analyzing contemporary data processing methods across many applications. The mention of Data Science in the book revolves around data as defined in the course plan called Datalogy presented at the IFIP Congress in 1968. The definition of data is “a representation of facts or ideas in a formalized manner capable of being communicated or manipulated by some process.”
1974 年,Peter Naur 将“数据科学”一词定义为“一旦数据建立,而数据与其所代表的关系则被委托给其他领域和科学的处理科学”。17 他出版了《瑞典和美国计算机方法简明调查》一书,分析了许多应用程序中的当代数据处理方法。书中提到的数据科学围绕着 1968 年 IFIP 大会上提出的名为 Datalogy 的课程计划中定义的数据。数据的定义是“以正式的方式表示事实或想法,能够通过某个过程进行交流或操纵”。

1977: The International Association for Statistical Computing Was Established 1977 年:国际统计计算协会成立

In 1977, The International Association for Statistical Computing (IASC) was established as a Section of the ISI during its 41st session.18 The premier statistical body stated: “It is the mission of the IASC to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts to convert data into information and knowledge.” The objectives of the Association are to promote a global interest in practical statistical computing and exchange technical knowledge through various international networking events between statisticians, computing professionals, corporations, government, and the general public.
1977 年,国际统计计算协会 (IASC) 在其第 41 届会议期间成立,作为 ISI 的一个分会。18 这个主要的统计机构表示:“IASC 的使命是将传统统计方法、现代计算机技术和领域专家的知识联系起来,将数据转化为信息和知识。该协会的目标是通过统计学家、计算专业人士、公司、政府和公众之间的各种国际网络活动,促进全球对实用统计计算的兴趣并交流技术知识。

1977: Exploratory Data Analysis by Tukey 1977 年:Tukey 的探索性数据分析

Exploratory data analysis is a branch that analyzes data sets to summarize their primary characteristics, using methods like data visualization and statistical graphics.19 In 1977, John W. Tukey wrote the book Exploratory Data Analysis where he argued that statistics placed undue importance on statistical hypothesis testing (confirmatory data analysis). The objective behind this approach was to examine the data before applying a specific probability model. Tukey also mentioned that intermingling the two types of analyses and using them on the same data set might result in systematic bias. This is primarily due to the inherent hypothesis testing suggested by a given dataset.
探索性数据分析是一个分支,它使用数据可视化和统计图形等方法分析数据集以总结其主要特征。1977 年,John W. Tukey 撰写了《探索性数据分析》一书,他认为统计学过分重视统计假设检验(验证性数据分析)。这种方法背后的目标是在应用特定概率模型之前检查数据。Tukey 还提到,将两种类型的分析混合并在同一数据集上使用它们可能会导致系统性偏倚。这主要是由于给定数据集建议的固有假设检验。

1989: The Emergence of Data Mining 1989 年:数据挖掘的出现

In 1989, Gregory Piatetsky-Shapiro organized and chaired the first Knowledge Discovery in Databases (KDD) workshop.20 The term “Knowledge Discovery in Databases” (KDD) was coined by Gregory Piatetsky-Shapiro. In the 1990s, The term “data mining” first appeared in the same database community.
1989 年,Gregory Piatetsky-Shapiro 组织并主持了第一个数据库知识发现 (KDD) 研讨会。20 “数据库中的知识发现”(KDD) 一词是由 Gregory Piatetsky-Shapiro 创造的。在 1990 年代,“数据挖掘”一词首次出现在同一个数据库社区中。

Today, almost every industry leverages data mining to analyze data and identify trends to achieve business objectives such as customer base expansion, pricing prediction, fluctuations in stock prices, and customer demand.
如今,几乎每个行业都利用数据挖掘来分析数据并确定趋势,以实现业务目标,例如客户群扩展、定价预测、股票价格波动和客户需求。

1996: The Term ‘Data Science’ Used for the First Time 1996 年:首次使用“数据科学”一词

For the first time in 1996, the term “data science” was included in the title of the fifth conference of the International Federation of Classification Societies (IFCS) in Kobe, Japan. The meeting was called “Data science, classification, and related methods.”(21)
1996 年,在日本神户举行的国际船级社联合会 (IFCS) 第五次会议的标题中首次包含了“数据科学”一词。会议名为“数据科学、分类和相关方法”。(21)

The papers presented during the conference were related to the field of data science, including theoretical and methodological advances in domains about data gathering, classification, and clustering. The knowledge-sharing sessions also revolved around exploratory and multivariate data analysis.
会议期间提交的论文与数据科学领域有关,包括数据收集、分类和聚类领域的理论和方法进展。知识共享会议还围绕探索性和多变量数据分析展开。

1997: Jeff Wu Insists Statistics be Renamed As Data Science 1997 年:Jeff Wu 坚持将统计学更名为数据科学

In 1997, Jeff Wu, during his inaugural lecture titled “Statistics = Data Science?” as the H. C. Carver, Chair in Statistics at the University of Michigan, suggested that statistics be renamed “data science” and statisticians be called “data scientists.”22 He characterized statistics as a combination of three elements, data collection, data modeling and analysis, and decision making.
1997 年,作为密歇根大学统计学系主任 H. C. Carver,Jeff Wu 在题为“统计学 = 数据科学”的就职演讲中,建议将统计学家更名为“数据科学”,将统计学家称为“数据科学家”。22 他将统计学描述为三个要素的组合,即数据收集、数据建模和分析以及决策。

Wu, a Taiwanese mathematician, and statistician explained that a new name would help statistics have a distinct identity and avoid confusion with other streams like accounting or data collection.
台湾数学家和统计学家 Wu 解释说,新名称将有助于统计数据具有独特的身份,并避免与会计或数据收集等其他流混淆。

1997: The term ‘Big Data’ Was Coined 1997 年:“大数据”一词诞生

In 1997, researchers from NASA, Michael Cox and David Ellsworth, first used the word, ‘Big Data’ in their paper, “Application-controlled demand paging for out-of-core visualization.”30.
1997 年,来自 NASA 的研究人员 Michael Cox 和 David Ellsworth 在他们的论文“用于核外可视化的应用程序控制需求分页”中首次使用了“大数据”一词。30.

Big Data refers to enormous data sets that usual software tools and computing systems cannot handle. In April 1998, John R. Mashey, an American computer scientist and entrepreneur, used the term Big Data in his paper Big Data … and the Next Wave of InfraStress.31
大数据是指通常软件工具和计算系统无法处理的巨大数据集。1998 年 4 月,美国计算机科学家和企业家 John R. Mashey 在他的论文《大数据…以及下一波 InfraStress31。

2001-2005: Data Science Gains Prominence 2001-2005 年:数据科学获得重视

Credit goes to William S. Cleveland for establishing data science as an independent discipline. In a 2001 paper, he called for an expansion of statistics beyond theory into technical areas.23 After early 2000, the term “Data science” became more widely used in the next few years: In 2002, the Committee on Data for Science and Technology launched the Data Science Journal. In 2003, Columbia University launched The Journal of Data Science.24
这要归功于 William S. Cleveland 将数据科学确立为一门独立的学科。在 2001 年的一篇论文中,他呼吁将统计学从理论扩展到技术领域。23 2000 年初之后,“数据科学”一词在接下来的几年中得到了更广泛的使用:2002 年,科学技术数据委员会推出了数据科学杂志。2003 年,哥伦比亚大学推出了 The Journal of Data Science24。

In 2005, the National Science Board called for a distinct career path for data science to ensure experts handle digital data collection.25 The National Science Board published “Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century” as support to promote Data Scientists.
2005 年,美国国家科学委员会 (National Science Board) 呼吁为数据科学提供一条独特的职业道路,以确保专家处理数字数据收集。25 美国国家科学委员会 (National Science Board) 出版了《长寿的数字数据收集:在 21 世纪促进研究和教育》(Long-life Digital Data Collections: Enabling Research and Education in the 21st Century),以支持培养数据科学家。

2006: Hadoop 0.1.0 Was Released 2006 年:Hadoop 0.1.0 发布

2006 saw the launch of Hadoop 0.1.0, an open-source, non-relational database. Hadoop was based on another open-source database, Apache Nutch.26 Yahoo deployed Hadoop using the programming model of MapReduce to process and store massive application volumes of several databases.
2006 年,Hadoop 0.1.0 发布,这是一个开源的非关系数据库。Hadoop 基于另一个开源数据库 Apache Nutch.26 Yahoo 使用 MapReduce 的编程模型部署 Hadoop,以处理和存储多个数据库的大量应用程序。

The launch of Hadoop also marked the beginning of Big Data. Doug Cutting and Mike Cafarella began working on Hadoop in 2002 when both were a part of the Apache Nutch project. The core objective behind the Nutch project was handling billions of searches and indexing millions of web pages. In July 2008, Apache successfully examined a 4000 node cluster with Hadoop.
Hadoop 的推出也标志着大数据的开始。Doug Cutting 和 Mike Cafarella 于 2002 年开始从事 Hadoop 方面的工作,当时他们都是 Apache Nutch 项目的一部分。Nutch 项目背后的核心目标是处理数十亿次搜索和为数百万个网页编制索引。2008 年 7 月,Apache 使用 Hadoop 成功检查了一个 4000 节点的集群。

Finally, Apache Hadoop was publicly released in November 2012 by Apache Software Foundation. Hadoop works by splitting files into large blocks and distributing them across nodes in a cluster. After this, it transfers the packaged code into several nodes allowing parallel data processing. This enabled faster and efficient processing of the dataset.
最后,Apache Hadoop 于 2012 年 11 月由 Apache Software Foundation 公开发布。Hadoop 的工作原理是将文件拆分为大块,并在集群中的节点之间分发它们。在此之后,它将打包的代码传输到多个节点中,允许并行数据处理。这样可以更快、更高效地处理数据集。

2007: The Research Center for Dataology and Data Science Was Established 2007 年:数据学与数据科学研究中心成立

In 2007, The Research Center for Dataology and Data Science was set up at Fudan University, Shanghai, China.27 In 2009, Yangyong Zhu and Yun Xiong, two of the researchers of the university, published “Introduction to Dataology and Data Science,” where they stated that Dataology and Data Science is a new science and independent research field, and different from natural science and takes data in cyberspace as its research object. 28
2007 年,数据学与数据科学研究中心在中国上海复旦大学成立27。2009 年,该大学的两位研究人员 Yangyong Zhu 和 Yun Xiong 发表了《数据学与数据科学导论》,他们指出数据学与数据科学是一门新的科学和独立的研究领域,不同于自然科学,以网络空间中的数据为研究对象。28

On June 22-23, 2010, Research Center for Dataology and DataScience, Fudan University, China, hosted “The First International Workshop on Dataology and Data Science.” It saw participation from over 30 scholars from international and domestic campuses exchanging ideas on “Dataology and Data Science.”
2010 年 6 月 22 日至 23 日,中国复旦大学数据学与数据科学研究中心举办了“首届数据学与数据科学国际研讨会”。来自国际和国内校园的 30 多名学者参加了此次活动,就“数据学和数据科学”交换了意见。

2014: AMSAT Changes Name to Section on Statistical Learning and Data Science 2014 年:AMSAT 更名为统计学习和数据科学部分

In 2014, the American Statistical Association’s Section on Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and Data Science, clearly reflecting the popularity of data science.29 The section name change might seem like a small step, but it signifies that the ASA has taken a significant step to strengthen the connection between statistics and data science.
2014 年,美国统计协会的统计学习和数据挖掘分会更名为统计学习和数据科学分会,这清楚地反映了数据科学的普及。29 该分会名称的更改似乎是一小步,但它意味着 ASA 在加强统计和数据科学之间的联系方面迈出了重要一步。

Walking Forward 向前迈进

Data Science has evolved immensely over the past decade and has conquered every industry that depends on data. There is also a massive demand for data scientists from varied academic and professional backgrounds.
数据科学在过去十年中取得了长足的发展,并征服了依赖数据的所有行业。对来自不同学术和专业背景的数据科学家也有巨大的需求。

Data stockpiles have seen an exponential increase, thanks to advances in storage and processing and storage which are cost-effective and efficient. According to IDC, by 2025, there will be over 175 zettabytes of data globally.
由于存储和处理以及存储方面的进步,数据储备呈指数级增长,这些进步具有成本效益和效率。根据 IDC 的数据,到 2025 年,全球数据量将超过 175 ZB。

In earlier days, data wasn’t as accessible as in the present times. Also, people were too skeptical about sharing their information. Even today, privacy and ethics are the foundation of data collection. Therefore every data scientist needs to operate within an ethical framework as the volume of data expands.
在早期,数据不像现在那样容易获得。此外,人们对分享他们的信息持怀疑态度。即使在今天,隐私和道德仍然是数据收集的基础。因此,随着数据量的增加,每个数据科学家都需要在道德框架内运作。

Experts believe that automation, blockchain, analytics, and democratization will shape the future of data science as a core function of business management.
专家认为,自动化、区块链、分析和民主化将塑造数据科学的未来,成为企业管理的核心功能。

Sources:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32


via: History of Data Science - Analyzing Alpha

https://analyzingalpha/data-science-history


数据科学历史和先驱者

The History Of Data Science and Pioneers You Should Know

August 25, 2022

Data science is a relatively new discipline. The term “Data Science” entered the lexicon in the early 21st century to categorize a new profession: the field of applied mathematics and statistics that provides insights based on large amounts of complex data or big data. Although the term Data Science is relatively contemporary, the history of Data Science is extensive.
数据科学是一门相对较新的学科。“数据科学”一词在 21 世纪初进入词典,对一种新职业进行分类:应用数学和统计领域,它根据大量复杂数据或大数据提供见解。尽管数据科学一词相对现代,但数据科学的历史很广泛。

Graduates with a Master of Science in Data Science are instrumental in furthering the discipline and helping organizations make discoveries from the world’s incredible reserves of Big Data. If you are interested in developing a solid data science strategy, you could join those making history in one of today’s most cutting-edge fields. Learn more about the disciplinary pioneers who have played a part in the conception and future of Data Science.
拥有数据科学理学硕士学位的毕业生有助于推动该学科的发展,并帮助组织从世界上令人难以置信的大数据储备中发现。如果您有兴趣制定可靠的数据科学战略,您可以加入那些在当今最前沿的领域之一创造历史的人。详细了解在数据科学的构想和未来中发挥作用的学科先驱。

Data Science Pioneers 数据科学先驱

Millions of professionals work daily to advance Data Science to the next level, from Big Tech Data Engineers in Silicon Valley to government officials leveraging AI applications to solve community challenges. Throughout the history of Data Science, several key figures have been instrumental in the development of Data Science, including the following historical icons.
从硅谷的科技大数据工程师到政府官员,数以百万计的专业人士每天都在努力将数据科学提升到一个新的水平,他们利用 AI 应用程序来解决社区挑战。纵观数据科学的历史,几个关键人物在数据科学的发展中发挥了重要作用,包括以下历史图标。

  • Ada Lovelace: This Countess programmed one of the world’s first computers more than 30 years before the invention of the electric light bulb. She is seen as an icon in the field of computer science. “The Analytical Engine has no pretensions whatsoever to originate anything,” she wrote. “It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical relations or truths.”
    Ada Lovelace:这位伯爵夫人在电灯泡发明前 30 多年就为世界上第一台计算机之一编写了编程。她被视为计算机科学领域的偶像。“分析引擎没有任何自命不凡的起源,”她写道。“它可以做我们知道如何命令它执行的任何操作。它可以跟随分析,但它没有能力预测任何分析关系或真理。
  • Timnit Gebru: Timnit is a computer scientist who advocates for diversity in technology, and is leading the way in the emerging field of ethical AI. Her work has included the study of algorithmic bias and resulting ethical implications, co-founding “Black in AI” - a community supporting inclusion of Black people in the field of AI, and co-leading the “Gender Shades” project which exposed bias in commercial AI systems.
    Timnit Gebru:Timnit 是一位计算机科学家,倡导技术多样性,并在新兴的道德 AI 领域处于领先地位。 她的工作包括研究算法偏见和由此产生的道德影响,共同创立了“Black in AI”——一个支持将黑人纳入 AI 领域的社区,以及共同领导了揭露商业 AI 系统中偏见的“Gender Shades”项目。
  • Alan Turing: He is considered to be the father of theoretical computer science and artificial intelligence. In 1942, Turing worked for the United States as part of an intelligence exchange and inspected the speech encryption system that enabled conversations between Churchill and Roosevelt.
    Alan Turing:他被认为是理论计算机科学和人工智能之父。1942 年,图灵作为情报交换的一部分为美国工作,并检查了使丘吉尔和罗斯福之间能够对话的语音加密系统。
  • Ronald Fisher: He is a historical icon in the world of statistics and is often described as the most important figure in the development of modern statistical research.
    罗纳德·费舍尔:他是统计学界的历史偶像,经常被描述为现代统计研究发展中最重要的人物。
  • Claude Shannon: Dr. Claude Shannon created the information theory, making today’s digital world possible. He was a mathematician, computer scientist, and creator of the “bit” (the basic unit of information), digital compression, and strategies for encoding and transmitting information between two points.
    Claude Shannon:Claude Shannon 博士创建了信息理论,使今天的数字世界成为可能。他是一位数学家、计算机科学家,也是“比特”(信息的基本单位)、数字压缩以及在两点之间编码和传输信息的策略的创造者。
  • [John Tukey](https://www.statistics/historical-spotlight-john-tukey/#:~:text=Tukey brought to the discipline,title of his 1977 book).): Tukey coined the term “data analysis” and encouraged data scientists to find stories and meaning in data sets.
    John Tukey:Tukey 创造了“数据分析”一词,并鼓励数据科学家在数据集中寻找故事和意义。
  • Edward Tufte: He is an American statistician and professor of political science, statistics, and computer science at Yale University, known for his research on information design and as a pioneer in the field of data visualization.
    Edward Tufte:他是美国统计学家,也是耶鲁大学政治学、统计学和计算机科学教授,以其在信息设计方面的研究而闻名,是数据可视化领域的先驱。
  • Yoshua Bengio: Bengio is recognized worldwide as a leading expert in artificial intelligence. Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, “the Nobel Prize of Computing,” with Geoffrey Hinton and Yann LeCun.
    Yoshua Bengio:Bengio 是全球公认的人工智能领域的领先专家。Yoshua Bengio 以其在深度学习领域的开创性工作而闻名,他与 Geoffrey Hinton 和 Yann LeCun 一起获得了 2018 年 AM 图灵奖,即“计算界的诺贝尔奖”。
  • Karen Spärck Jones: She is an iconic British computer scientist behind the concept of inverse document frequency and index-term weighting — the principles are the foundation for modern search engines like Google. In 2019, The New York Times called her “a pioneer of computer science for work combining statistics and linguistics and an advocate for women in the field.”
    Karen Spärck Jones:她是一位标志性的英国计算机科学家,提出了逆向文档频率和索引词加权的概念,这些原则是 Google 等现代搜索引擎的基础。2019 年,《纽约时报》称她为“计算机科学的先驱,将统计学和语言学相结合,是该领域女性的倡导者”。

A Brief History of Data Science 数据科学简史

The journey to structure, organize and understand data has a long history. The evolution of data science has involved discussions by scientists, statisticians, researchers, computer scientists, and notable industry pioneers for generations. The following timeline traces the evolution of Data Science and its inception, use, and popularity over the years.
构建、组织和理解数据的旅程由来已久。数据科学的发展涉及科学家、统计学家、研究人员、计算机科学家和几代著名行业先驱的讨论。以下时间线追溯了 Data Science 多年来的演变及其诞生、使用和普及情况。

1957

  • Arthur Samuel coined the term “machine learning” and created the Samuel Checkers-Playing program, one of the world’s first successful self-learning programs.
    Arthur Samuel 创造了“机器学习”一词,并创建了 Samuel Checkers-Playing 程序,这是世界上第一个成功的自学程序之一。
  • IBM develops Fortran, a programming language that remains in use today.
    IBM 开发了 Fortran,这是一种至今仍在使用的编程语言。

1962

  • John Tukey wrote a paper titled The Future of Data Analysis. He described a shift in the world of statistics, the merging of statistics and computers, and when computers were first used to solve mathematical problems and work with statistics.
    John Tukey 写了一篇题为《数据分析的未来》的论文。他描述了统计世界的转变、统计学和计算机的融合,以及计算机首次用于解决数学问题和处理统计学的时间。

1964

  • Karen Spärck Jones published Synonymy and Semantic Classification, now considered a foundational paper in natural language processing.
    Karen Spärck Jones 发表了 Synonymy and Semantic Classification,现在被认为是自然语言处理的基础论文。

1974

  • Peter Naur used the term “Data Science” throughout his 1974 publication, “The Concise Survey of Computer Methods”. He defined Data Science as “The usefulness of data and data processes derives from their application in building and handling models of reality.”
    Peter Naur 在他 1974 年的出版物《计算机方法简明调查》中使用了“数据科学”一词。他将数据科学定义为“数据和数据处理的有用性源于它们在构建和处理现实模型方面的应用”。

1977

  • The International Association for Statistical Computing (IASC) was formed with a mission to “foster worldwide interest in effective statistical computing and to exchange technical knowledge through international contacts and meetings between statisticians, computing professionals, organizations, institutions, governments, and the general public.”
    国际统计计算协会 (IASC) 成立的使命是“通过统计学家、计算专业人员、组织、机构、政府和公众之间的国际联系和会议,促进全球对有效统计计算的兴趣并交流技术知识”。
  • Tukey published a second paper, titled Exploratory Data Analysis, about the importance of data in selecting and testing hypotheses.
    Tukey 发表了第二篇论文,题为《探索性数据分析》,探讨了数据在选择和检验假设中的重要性。

1986

  • A professor at Carnegie Mellon University, Hinton co-authors a paper with David E. Rumelhart and Ronald J. Williams on applying the backpropagation algorithm to multi-layer neural networks. This application was a milestone in AI because it allowed the networks to learn internal representations of data.
    Hinton 是卡内基梅隆大学的教授,他与 David E. Rumelhart 和 Ronald J. Williams 合著了一篇关于将反向传播算法应用于多层神经网络的论文。此应用程序是 AI 的一个里程碑,因为它允许网络学习数据的内部表示。

1989

  • The Knowledge Discovery in Databases organization scheduled its first Data Science workshop. This organization and conference would later rebrand into the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, a conference that continues to run in 2022.
    数据库中的知识发现组织安排了其第一个数据科学研讨会。这个组织和会议后来更名为 ACM SIGKDD 知识发现和数据挖掘会议,该会议将于 2022 年继续举行。

1990

  • Researchers published a journal article titled CoverStory: Automated News Finding in Marketing about how companies leverage customer data in supermarkets to inform marketing strategies. This paper discusses customer data collection, automation, and personalization.
    研究人员发表了一篇题为《CoverStory: Automated News Finding in Marketing》的期刊文章,介绍了公司如何利用超市中的客户数据来制定营销策略。本白皮书讨论了客户数据收集、自动化和个性化。

1993

  • Yoshua Bengio, a professor at the Université de Montréal, founded Mila, the Montreal Institute for Learning Algorithms, a research institute on AI.
    蒙特利尔大学教授 Yoshua Bengio 创立了 Mila,即蒙特利尔学习算法研究所,一家人工智能研究机构。

1997

  • IBM’s supercomputer program, Deep Blue, shocked the world when it beat the world chess champion, Gary Kasparov, in a six-game match.
    IBM 的超级计算机程序 Deep Blue 在六场比赛中击败了国际象棋世界冠军加里·卡斯帕罗夫 (Gary Kasparov),震惊了世界。

1998

  • The acronym NoSQL was first used by Carlo Strozzi and referred to a lightweight, open-source “relational” database that did not use SQL.
    首字母缩略词 NoSQL 最初由 Carlo Strozzi 使用,指的是不使用 SQL 的轻量级开源“关系”数据库。
  • Yoshua Bengio published a groundbreaking paper, “Gradient-based Learning Applied To Document Recognition,” proving that specific algorithms can recognize images more accurately than standard technology.
    Yoshua Bengio 发表了一篇开创性的论文,“应用于文档识别的基于梯度的学习”,证明特定算法可以比标准技术更准确地识别图像。

1999

  • Jacob Zahavi and Robert Stine publish Mining Data for Nuggets of Knowledge, a paper that explores how companies must use data to inform customer behaviors and market trends.
    Jacob Zahavi 和 Robert Stine 发表了 Mining Data for Nuggets of Knowledge,这是一篇探讨公司必须使用数据来告知客户行为和市场趋势的论文。

2001

  • Software-as-a-Service (SaaS) was created, and Salesforce became a pioneer in the SaaS space. This was the precursor to using cloud-based applications.
    软件即服务 (SaaS) 应运而生,Salesforce 成为 SaaS 领域的先驱。这是使用基于云的应用程序的先驱。
  • William S. Cleveland created an action plan to expand the technical areas of statistics focused on the data analyst titled, Data science: An action plan for expanding the technical areas of the field of statistics. The plan sets out six technical work areas for a university department, government research lab, or corporate research organization and advocates for the appropriate allocation of resources devoted to research in each area.
    William S. Cleveland 制定了一项行动计划,以扩展以数据分析师为重点的统计技术领域,标题为“数据科学:扩展统计领域技术领域的行动计划”。该计划为大学部门、政府研究实验室或企业研究组织规定了六个技术工作领域,并倡导适当分配用于每个领域研究的资源。

2002

  • The International Council for Science: Committee on Data for Science and Technology (CODATA) started publishing the Data Science Journal, which focused on Data Science topics like the description of data systems, publication on the internet, applications, and risk and compliance issues.
    国际科学理事会:科学技术数据委员会 (CODATA) 开始出版数据科学期刊,该期刊侧重于数据科学主题,例如数据系统的描述、在 Internet 上的发布、应用程序以及风险和合规性问题。

2006

  • Hadoop 0.1.0, an open-source, non-relational database, was released. Apache Hadoop is used in the present day as an open-source software library that allows for Big Data research.
    Hadoop 0.1.0 是一个开源的非关系数据库,现已发布。Apache Hadoop 在当今用作允许大数据研究的开源软件库。

2008

  • DJ Patil and Jeff Hammerbacher of LinkedIn and Facebook make “Data Scientist” an official buzzword.
    LinkedIn 和 Facebook 的 DJ Patil 和 Jeff Hammerbacher 将“数据科学家”作为官方流行语。

2009

  • NoSQL was reintroduced when Eric Evans and Johan Oskarsson used it to describe non-relational databases.
    当 Eric Evans 和 Johan Oskarsson 使用 NoSQL 来描述非关系数据库时,NoSQL 被重新引入。

2011

  • Job listings for data scientists increased by 15,000%.
    数据科学家的职位列表增加了 15,000%。

2012

  • Harvard University declared the role of a Data Scientist as the sexiest job of the 21st century.
    哈佛大学宣布数据科学家的角色是 21 世纪最性感的工作。

2013

  • Statistics about Big Data, widely attributed to IBM, went viral: 90% of the data in the world was created within the last two years.
    有关大数据的统计数据被广泛认为是 IBM 的,它在网上疯传:世界上 90% 的数据是在过去两年内创建的。

2015

  • Google uses Deep Learning to launch speech recognition, Google Voice, and saw a 49 percent increase in performance.
    Google 使用深度学习推出了语音识别 Google Voice,性能提高了 49%。
  • Google launched open-sourced TensorFlow, an artificial intelligence engine to enact Deep Learning using Big Data and Cloud.
    Google 推出了开源 TensorFlow,这是一个人工智能引擎,用于使用大数据和云实施深度学习。

2017

  • The DeepMind team released AlphaZero. In 24 hours, AlphaZero achieved a superhuman level of play in Chess, Shogi, and Go by defeating world-champion programs Stockfish, Elmo, and the 3-day version of AlphaGo Zero.
    DeepMind 团队发布了 AlphaZero。在 24 小时内,AlphaZero 击败了世界冠军程序 Stockfish、Elmo 和 AlphaGo Zero 的 3 天版本,在国际象棋、将棋和围棋中达到了超人的水平。
  • PricewaterhouseCoopers (PwC) forecasts job listings for data science and analytics will surge to 2.7 million by 2020.
    普华永道 (PwC) 预测,到 2020 年,数据科学和分析领域的职位列表将激增至 270 万个。

2018

  • Timnit Gebru and Joy Buolamwini co-author the paper ”Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," detailing the tendency toward gender and racial bias found in commercial AI facial recognition software,
    Timnit Gebru 和 Joy Buolamwini 合著了论文“性别阴影:商业性别分类中的交叉准确性差异”,详细介绍了商业 AI 面部识别软件中发现的性别和种族偏见趋势。

2020

  • The WHO and its partners launch the Solidarity Trial, an international clinical trial to meet at the intersection of biology and technology and generate data sources and insights to create the most effective treatments for COVID-19.
    WHO 及其合作伙伴启动了 Solidarity 试验,这是一项国际临床试验,旨在在生物学和技术的交叉点会面,并生成数据源和见解,以创造最有效的 COVID-19 治疗方法。

Today

  • The market for Big Data analytics in banking could rise to $62.10 billion by 2025.
    到 2025 年,银行业大数据分析市场可能上升到 621 亿美元。
  • Data creation will grow to more than 180 zettabytes by 2025.
    到 2025 年,数据创建量将增长到 180 ZB 以上。
  • Data science jobs will increase by around 28% by 2026.
    到 2026 年,数据科学工作岗位将增加约 28%。
  • The global machine learning market was valued at $8 billion in 2021 and is anticipated to grow at a 39 percent compound annual growth rate (CAGR) by 2027.
    2021 年全球机器学习市场价值 80 亿美元,预计到 2027 年将以 39% 的复合年增长率 (CAGR) 增长。

via:The History Of Data Science and Pioneers You Should Know | Worcester Polytechnic Institute

https://onlinestemprograms.wpi.edu/blog/history-data-science-and-pioneers-you-should-know


数据科学的演变:增长与创新

Evolution of Data Science: Growth & Innovation

October 21, 2021

The term “data science” — and the practice itself — has evolved over the years. In recent years, its popularity has grown considerably due to innovations in data collection, technology, and mass production of data worldwide. Gone are the days when those who worked with data had to rely on expensive programs and mainframes. The proliferation of programming languages like Python and procedures to collect, analyze, and interpret data paved the way for data science to become the popular field it is today.
“数据科学”一词及其实践本身多年来不断发展。近年来,由于全球数据收集、技术和数据大规模生产的创新,它的受欢迎程度大大提高。处理数据的人不得不依赖昂贵的程序和大型机的日子已经一去不复返了。Python 等编程语言以及收集、分析和解释数据的过程的激增为数据科学成为当今的热门领域铺平了道路。

Data science began in statistics. Part of the evolution of data science was the inclusion of concepts such as machine learning, AI Large Language Models (LLMs), and the internet of things. With the flood of new information coming in and businesses seeking new ways to increase profit and make better decisions, data science started to expand to other fields, including medicine, engineering, and more.
数据科学始于统计学。数据科学发展的一部分是包含机器学习、AILarge 语言模型 (LLM) 和物联网等概念。随着新信息的涌入和企业寻求增加利润和做出更好决策的新方法,数据科学开始扩展到其他领域,包括医学、工程等。

In this article, we’ll share a concise summary of the evolution of data science — from its humble beginnings as a statistician’s dream to its current state as a unique science in its own right recognized by every imaginable industry.
在本文中,我们将简要总结数据科学的演变——从统计学家的梦想开始,到现在作为一门独特的科学,它本身就被每个可以想象的行业所认可。

In this article, we’ll share a concise summary of the evolution of data science — from its humble beginnings as a statistician’s dream to its current state as a unique science in its own right recognized by every imaginable industry.
在本文中,我们将简要总结数据科学的演变——从统计学家的梦想开始,到现在作为一门独特的科学,它本身就被每个可以想象的行业所认可。

Origins, Predictions, Beginnings 起源、预测、开端

We could say that data science was born from the idea of merging applied statistics with computer science. The resulting field of study would use the extraordinary power of modern computing. Scientists realized they could not only collect data and solve statistical problems but also use that data to solve real-world problems and make reliable fact-driven predictions.
我们可以说,数据科学诞生于应用统计学与计算机科学合并的想法。由此产生的研究领域将利用现代计算的非凡力量。科学家们意识到,他们不仅可以收集数据和解决统计问题,还可以使用这些数据来解决现实世界的问题,并做出可靠的事实驱动预测。

1962: American mathematician John W. Tukey first articulated the data science dream. In his now-famous article “The Future of Data Analysis,” he foresaw the inevitable emergence of a new field nearly two decades before the first personal computers. While Tukey was ahead of his time, he was not alone in his early appreciation of what would come to be known as “data science.” Another early figure was Peter Naur, a Danish computer engineer whose book Concise Survey of Computer Methods offers one of the very first definitions of data science:
1962 年:美国数学家 John W. Tukey 首次提出了数据科学梦想。在他现在著名的文章《数据分析的未来》中,他预见到在第一台个人计算机出现前近二十年,一个新领域的出现是不可避免的。虽然 Tukey 走在时代的前面,但他并不是唯一一个早期欣赏后来被称为“数据科学”的东西的人。另一个早期人物是丹麦计算机工程师 Peter Naur,他的书 计算机方法简明调查 提供了数据科学的最早定义之一:

“The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”
“一旦数据建立起来,处理数据的科学就会被委托给其他领域和科学。”

1977: The theories and predictions of “pre” data scientists like Tukey and Naur became more concrete with the establishment of The International Association for Statistical Computing (IASC), whose mission was “to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.”
1977 年:随着国际统计计算协会 (IASC) 的成立,Tukey 和 Naur 等“前”数据科学家的理论和预测变得更加具体,该协会的使命是“将传统统计方法、现代计算机技术和领域专家的知识联系起来,以便将数据转化为信息和知识”。

1980s and 1990s: Data science began taking more significant strides with the emergence of the first Knowledge Discovery in Databases (KDD) workshop and the founding of the International Federation of Classification Societies (IFCS). These two societies were among the first to focus on educating and training professionals in the theory and methodology of data science (though that term had not yet been formally adopted).
1980 年代和 1990 年代:随着第一个数据库中的知识发现 (KDD) 研讨会的出现和国际分类协会联合会 (IFCS) 的成立,数据科学开始取得更大的进步。这两个协会是最早专注于教育和培训数据科学理论和方法专业人员的协会之一(尽管该术语尚未正式采用)。

It was at this point that data science started to garner more attention from leading professionals hoping to monetize big data and applied statistics.
正是在这一点上,数据科学开始受到希望通过大数据和应用统计获利的领先专业人士的更多关注。

1994: BusinessWeek published a story on the new phenomenon of "Database Marketing.” It described the process by which businesses were collecting and leveraging enormous amounts of data to learn more about their customers, competition, or advertising techniques. The only problem at the time was that these companies were flooded with more information than they could possibly manage. Massive amounts of data were sparking the first wave of interest in establishing specific roles for data management. It began to seem like businesses would need a new kind of worker to make the data work in their favor.
1994 年:《商业周刊》发表了一篇关于“数据库营销”新现象的报道。它描述了企业收集和利用大量数据来更多地了解其客户、竞争或广告技术的过程。当时唯一的问题是,这些公司被大量的信息淹没,超出了他们的管理能力。海量数据激发了对建立数据管理特定角色的第一波兴趣。企业开始似乎需要一种新型的员工来使数据对他们有利。

1990s and early 2000s: We can clearly see that data science has emerged as a recognized and specialized field. Several data science academic journals began to circulate, and data science proponents like Jeff Wu and William S. Cleveland continued to help develop and expound upon the necessity and potential of data science.
1990 年代和 2000 年代初:我们可以清楚地看到,数据科学已成为一个公认的专业领域。一些数据科学学术期刊开始流通,Jeff Wu 和 William S. Cleveland 等数据科学支持者继续帮助开发和阐述数据科学的必要性和潜力。

2000s: Technology made enormous leaps by providing nearly universal access to internet connectivity, communication, and (of course) data collection.
2000 年代:技术通过提供几乎普遍的互联网连接、通信和(当然)数据收集,取得了巨大的飞跃。

2005: Big data enters the scene. With tech giants such as Google and Facebook uncovering large amounts of data, new technologies capable of processing them became necessary. Hadoop rose to the challenge, and later on Spark and Cassandra made their debuts.
2005 年:大数据进入现场。随着 Google 和 Facebook 等科技巨头发现了大量数据,能够处理这些数据的新技术变得必要。Hadoop 迎接了挑战,后来 Spark 和 Cassandra 首次亮相。

2014: Due to the increasing importance of data, and organizations’ interest in finding patterns and making better business decisions, demand for data scientists began to see dramatic growth in different parts of the world.
2014 年:由于数据的重要性日益增加,以及组织对寻找模式和做出更好的业务决策的兴趣,世界不同地区对数据科学家的需求开始急剧增长。

source: https://www.zarantech/blog/why-data-science-jobs-are-in-huge-demand/

2015: Machine learning, deep learning, and Artificial Intelligence (AI) officially enter the realm of data science. These technologies have driven innovations over the past decade — from personalized shopping and entertainment to self-driven vehicles along with all the insights to efficiently bring forth these real-life applications of AI into our daily lives.
2015 年:机器学习、深度学习和人工智能 (AI) 正式进入数据科学领域。这些技术在过去十年中推动了创新——从个性化购物和娱乐到自动驾驶汽车,以及有效地将这些 AI 的现实生活应用引入我们日常生活的所有见解。

2018: New regulations in the field are perhaps one of the biggest aspects in the evolution in data science.
2018 年:该领域的新法规可能是数据科学发展的最大方面之一。

2020s: We are seeing additional breakthroughs in AI, machine learning, and an ever-more-increasing demand for qualified professionals in Big Data
2020 年代:我们看到了 AI、机器学习方面的更多突破,以及对大数据领域合格专业人员的需求不断增长

The Future of Data Science 数据科学的未来

Seeing how much of our world is currently powered by data and data science, we can reasonably ask, Where do we go from here? What does the future of data science hold? While it’s difficult to know exactly what the hallmark breakthroughs of the future will be, all signs seem to indicate the critical importance of machine learning. Data scientists are searching for ways to use machine learning to produce more intelligent and autonomous AI.
看到我们世界目前有多少地区由数据和数据科学提供支持,我们可以合理地问,我们从这里何去何从?数据科学的未来会怎样?虽然很难确切知道未来的标志性突破会是什么,但所有迹象似乎都表明机器学习的极端重要性。数据科学家正在寻找使用机器学习来生成更智能、更自主的 AI 的方法。

In other words, data scientists are working tirelessly toward developments in deep learning to make computers smarter. These developments can bring about advanced robotics paired with a powerful AI. Experts predict the AI will be capable of understanding and interacting seamlessly with humans, self-driving vehicles, and automated public transportation in a world interconnected like never before. This new world will be made possible by data science.
换句话说,数据科学家正在孜孜不倦地致力于深度学习的发展,以使计算机更智能。这些发展可以带来先进的机器人技术与强大的人工智能相结合。专家预测,人工智能将能够在一个前所未有的互联世界中理解人类、自动驾驶汽车和自动公共交通并与之无缝交互。数据科学将使这个新世界成为可能。

Perhaps, on the more exciting side, we may see an age of extensive automation of labor in the near future. This is expected to revolutionize the healthcare, finance, transportation, and defense industries.
也许,从更令人兴奋的方面来看,我们可能会在不久的将来看到一个广泛的劳动力自动化时代。预计这将彻底改变医疗保健、金融、运输和国防工业。


via: The Evolution of Data Science – Dataquest

https://www.dataquest.io/blog/evolution-of-data-science-growth-innovation/


  • 什么是数据科学- 数据科学介绍?| IBM

    https://www.ibm/cn-zh/topics/data-science

  • A Brief History of Data Science - DATAVERSITY

    https://www.dataversity/brief-history-data-science/

本文标签: 简史科学数据