Data science is not just about mathematics, statistics, and coding. It is about using these tools to generate new insights, and about telling a great story. When data analysis and great storytelling come together, they can help us understand how the world works, challenge our (mis)conceptions of reality, and inspire us to do better. Here is a quick overview of five accessible books that do precisely that.

数据科学不仅涉及数学,统计和编码。 它是关于使用这些工具来产生新的见解,以及讲述一个伟大的故事 。 当数据分析和出色的讲故事结合在一起时,它们可以帮助我们了解世界的运作方式,挑战我们对现实的(错误)观念,并激发我们做得更好。 这是一本五本可做的书的快速概述。

事实:我们错失世界的十个原因-以及为什么事情比你想的要好(2018) (Factfulness: Ten Reasons We’re Wrong About the World — and Why Things Are Better Than You Think (2018))

By Hans Rosling, Anna Rosling Rönnlund, and Ola Rosling | 352 pages.

汉斯·罗斯林,安娜·罗斯林Rönnlund和奥拉·罗斯林| 352页。

Known for a series of influential Ted Talks, Hans Rosling (1948–2017) and his co-authors are on a mission to challenge our misconceptions about the state of the world. The authors begin their book with the startling observation that our ideas about progress in the world are wrong across the board, whether it be poverty, education, or global health. We consistently over-estimate the levels of misery in the world, and under-estimate the progress that we’ve made over the past decades.

汉斯·罗斯林(Hans Rosling,1948–2017年)及其合著者以一系列颇具影响力的泰德演讲而著称,其使命是挑战我们对世界状况的误解。 作者从令人吃惊的观察开始他们的书,他们观察到我们关于世界进步的想法是全面错误的,无论是贫穷,教育还是全球健康。 我们一直高估世界上的苦难程度,而低估了我们在过去几十年中取得的进展。

Our lives would become unmanageable if we were to approach every problem rationally and base every single decision on data


I say “we” because Rosling et al. show that these misconceptions persist among all, regardless of education and geography, or occupation. And, even when presented with the hard facts (e.g. World Bank and United Nations data) we hardly seem to update our world views. The authors argue that our failure to update our views based on new data is a product of evolution: our lives would become impossibly difficult if we were to approach every problem rationally and base every single one of our decisions on data, instead of on our instincts. We need to overcome and correct for our biases. The authors offer an effective recipe for doing so.

我之所以说“我们”,是因为Rosling等人。 表明这些误解在所有人中持续存在,无论其受教育和地域或职业如何。 而且,即使在面对困难事实(例如世界银行和联合国数据)的情况下,我们似乎也几乎没有更新我们的世界观。 作者认为,我们无法根据新数据更新观点是进化的产物:如果我们要理性地解决每个问题并将每个决策都基于数据而不是基于直觉,那么我们的生活将变得异常困难。 。 我们需要克服并纠正我们的偏见。 作者提供了这样做的有效方法。

Factfulness offers a wonderful call to action, to use data to inform our world view, and more importantly, to overcome the inherent evolutionary biases of our brains


Factfulness at times leaves the reader wanting for more. Specifically, I would have appreciated a more in-depth and properly sourced discussion of the evolutionary nature of our brain’s instincts when it comes to interpreting and understanding data, and updating our beliefs. That being said, this book is as much intended to teach as to entertain and has sacrificed some detail in the interest of accessibility — something the authors have done effectively.

有时的真实性使读者渴望更多。 具体来说,我将对在解释和理解数据以及更新我们的信念方面对大脑的直觉的进化本质进行更深入,更合理的讨论表示赞赏。 话虽这么说,这本书既是为了教书又是为了娱乐,并且为了获得可访问性而牺牲了一些细节,这是作者有效完成的。

With Factfulness, Rosling et al. offer a call to action, to use data to inform our world view, and more importantly, to overcome the inherent evolutionary biases of our brains.

随着事实 ,罗斯林等。 提供行动号召,使用数据来传达我们的世界观,更重要的是,克服我们大脑固有的进化偏见。

赖以生存的算法:人类决策计算机科学(2016) (Algorithms to Live by: The Computer Science of Human Decisions (2016))

By Brian Christian and Tom Griffiths | 368 pages.

布莱恩·克里斯蒂安和汤姆·格里菲思| 368页。

Recommended to me by a colleague, Algorithms to Live by has become one of my favourite books. Computers and humans face different but similar challenges: of how to achieve outcomes under certain constraints. The authors draw a wonderful series of parallels between human decision-making and computer science, and show how algorithms can inform and improve our daily lives and decision-making processes.

一位同事向我推荐的《 算法生存》已成为我最喜欢的书之一。 计算机和人类面临着不同但相似的挑战:如何在一定的约束下实现结果。 作者在人类决策和计算机科学之间绘制了一系列奇妙的相似之处,并展示了算法如何为我们的日常生活和决策过程提供信息并改善我们的生活。

Algorithms to Live by is a witty and accessible introduction to different algorithms used in computer science for the non-expert, and a fun re-encounter for the seasoned data scientist


How many apartments should you visit before making a bid? How many people should you date before making a commitment? And, how many candidates should you interview before making a hire? Believe it or not, most of these questions have an “optimal” mathematical answer, and computer science can help us make the best choice.

投标前应参观几套公寓? 在做出承诺之前,您应该与几个人约会? 而且,在录用之前,您应该采访多少候选人? 信不信由你,这些问题中的大多数都有“最佳”数学答案,计算机科学可以帮助我们做出最佳选择。

Algorithms to Live by is a witty and accessible introduction to different algorithms used in computer science for the non-expert, and a fun re-encounter for the seasoned data scientist.

对于非专家而言, Algorithms to Live是对计算机科学中使用的不同算法的机智且易于访问的介绍,对于经验丰富的数据科学家而言,这是有趣的重遇。

裸统计:从数据中剥离恐惧(2014) (Naked Statistics: Stripping the Dread from the Data (2014))

By Charles Wheelan | 304 pages.

查尔斯·惠兰| 304页。

Naked Statistics is STATS-101 in book form. It is a remarkably accessible read on statistics that uses tons of examples from real-world applications, ranging from Netflix’s recommendation engine, to football, and education. If anything, you will finish this book convinced that statistics is far from abstract, and one of the most effective tools to deal with all sorts of societal questions and problems.

裸露的统计信息是STATS-101(书籍形式)。 这是一本引人注目的统计资料,使用了来自Netflix推荐引擎,足球和教育等实际应用的大量示例。 如果有的话,您将完成本书,并确信统计数据远非抽象的,它是处理各种社会问题的最有效工具之一。

If your introductory stats course is not doing it for you, Naked Statistics is a nice and effective complement to your course materials

如果您的入门统计课程不适合您, Naked Statistics可以作为您课程材料的很好而有效的补充

Don’t expect to be a stats expert after reading Naked Statistics. You will however go home with a firm grasp of the basic concepts behind statistics (think: mean, median, standard deviation, correlation, regression, the central limit theorem, etc.). If your introductory stats course is not doing it for you, Naked Statistics is a nice and effective complement to your course materials.

阅读《 裸眼统计》后不要指望成为统计专家 但是,您将掌握统计背后的基本概念(例如:均值,中位数,标准差,相关性,回归,中心极限定理等)。 如果您的入门统计课程不适合您,那么Naked Statistics是您课程材料的很好而有效的补充。

数学破坏武器(2016) (Weapons of Math Destruction (2016))

By Cathy O’Neil | 272 pages.

凯茜·奥尼尔(Cathy O'Neil)| 272页。

With Weapons of Math Destruction, Cathy O’Neil harnesses a new term with great effect. O’Neil illuminates how algorithms increasingly dominate our lives, and do so in the absence of regulation or transparency. Teachers are sacked on the basis of iffy (and manipulable) modelling of their students’ attainment based on test scores (the “value-added” model). Innocent people are branded potential criminals using predictive modelling without any means of redress. And, insurance premiums are determined by zip code, set by opaque algorithms.

借助数学毁灭武器 凯茜·奥尼尔(Cathy O'Neil)运用了一个新术语,效果斐然。 奥尼尔(O'Neil)说明了算法如何在我们的生活中越来越占主导地位,并且在缺乏监管或透明性的情况下也是如此。 根据考试成绩的虚假(且可操纵)模型对教师进行解雇(“增值”模型)。 无辜的人是使用预测模型而没有任何补救手段的烙印潜在犯罪分子。 而且,保险费由邮政编码(由不透明的算法设置)确定。

While a lack of regulation and transparency produces conditions for models to turn into “weapons of math destruction”, this is not inherent to the models or the data themselves


While the author makes a compelling argument for better governance, I do not share her bleak view of the “dark side” of big data. A lack of regulation and transparency produces conditions for models to turn into “weapons of math destruction”, but this is not inherent to the models or the data themselves. Data scientists generally are well aware of the “garbage in; garbage out” problem and of the consequences of their modelling choices. Lately, awareness on this topic has grown, and we’re seeing calls for participatory machine learning that aims to involve affected populations in the design phase.

尽管作者为更好的治理提出了令人信服的论点,但我不同意她对大数据“阴暗面”的悲观看法。 缺乏监管和透明性为模型变成“数学破坏武器”创造了条件,但这不是模型或数据本身固有的。 数据科学家通常都非常清楚“垃圾在其中;垃圾在其中”。 垃圾”问题及其建模选择的后果。 最近,对这个主题的认识有所增加,并且我们看到了参与式机器学习的需求 ,该机器学习旨在使受影响的人群参与设计阶段。

Moreover, governments are able to create healthy context for these tools to be used, although they should do better. For example, it’d be interesting to see a similar study that focuses on the EU, where personal data enjoys greater legal protections.

此外,政府能够创造健康的上下文中使用这些工具,但他们应该做的更好。 例如,有趣的是,有一项针对欧盟的类似研究非常有趣,在欧盟, 个人数据受到更大的法律保护 。

In spite of its shortcomings, Weapons of Math Destruction holds one important lesson for data scientists: always be aware that your work has real-life consequences, and make decisions accordingly.

尽管有其缺点, 数学破坏武器对数据科学家还是有一个重要的教训:始终要意识到您的工作具有现实生活的后果,并据此做出决策。

异常值:成功的故事(2008) (Outliers: The Story of Success (2008))

By Malcolm Gladwell | 336 pages.

马尔科姆·格拉德威尔| 336页。

This book is the odd one out in this list and is not about data science. In fact, it does not take a quantitative approach at all, and some of you might wonder why I added Outliers to this overview. For data scientists, understanding the context in which the phenomenon that you are modelling operates is crucial. It helps you build hypotheses and inform your choice of models.

这本书是该列表中的一本奇怪的书,与数据科学无关。 实际上,它根本不需要采取定量方法,有些人可能想知道为什么我在此概述中添加了异常值 。 对于数据科学家来说,了解正在建模的现象所处的环境至关重要。 它可以帮助您建立假设并告知模型选择。

Malcolm Gladwell’s Outliers challenges mainstream narratives of what made successful people successful. Success, he argues, involves practice. Gladwell borrows from the theory of the 10,000-Hour Rule here, which states that perfecting a skill requires 10,000 hours of dedicated practice (this notion has later come under heavy scrutiny). Second, circumstance comes into play: access to the right resources, the cultural context, and in some cases, plain old good luck.

马尔科姆·格拉德威尔(Malcolm Gladwell)的异常值挑战了成功人士成功的主流叙事。 他认为,成功涉及实践。 Gladwell借鉴了此处的10,000小时规则的理论,该理论指出,完善技能需要10,000个小时的奉献实践(此概念后来受到严格审查 )。 其次,情况发挥了作用:获得正确的资源,文化背景,在某些情况下,还很旧,很幸运。

I recommend reading this book with a data science mindset and trying to unpick what a qualitative approach offers that a quantitative study might not, and, vice versa, how we would answer the questions that Gladwell engages with had we taken a quantitative approach.


Yes, the evidence is anecdotal. Yes, the analysis is simplistic at times. And yes, there is sampling bias in the examples that Gladwell presents. Yet, in spite of these shortcomings, Outliers is a powerful reminder of the impact of chance and circumstance and of how precarious outcomes are.

是的,证据是轶事。 是的,分析有时是简单的。 是的,在Gladwell提出的示例中存在抽样偏差。 然而,尽管存在这些缺点,但异常值仍然强烈提醒了机会和环境的影响以及结果的不稳定程度。

Good data scientists are story tellers in their own right


Gladwell is a master story teller. Good data scientists are story tellers in their own right: you may be amazing at analysing data, but you need to be able to distill its essence and communicate your story effectively to deliver value for your customer, your colleagues, or your company. Outlier’s appreciation of the importance of context and its demonstration of good story telling are important lessons for any data scientist.

Gladwell是一位讲故事的大师。 优秀的数据科学家本身就是讲故事的人:您可能在分析数据方面很棒,但您需要能够提炼其本质并有效地交流您的故事,以便为客户,同事或公司创造价值。 对于任何数据科学家来说, 离群值对上下文重要性的理解以及对讲故事的示范都是重要的课程。

Thanks for reading! What data science books did you enjoy reading? Do leave your recommendations in the comments!

谢谢阅读! 您喜欢阅读哪些数据科学书籍? 不要在评论中留下您的建议!

翻译自: https://towardsdatascience/five-books-that-aspiring-data-scientists-should-read-dd39a56bd3be

本文标签: 本书抱负科学家数据