官术网_书友最值得收藏!

Preface

Data science is fashionable. Data science startups are sprouting across the globe and established companies are scrambling to assemble data science teams. The ability to analyze large datasets is also becoming increasingly important in the academic and research world.

Why this explosion in demand for data scientists? Our view is that the emergence of data science can be viewed as the serendipitous collusion of several interlinked factors. The first is data availability. Over the last fifteen years, the amount of data collected by companies has exploded. In the world of research, cheap gene sequencing techniques have drastically increased the amount of genomic data available. Social and professional networking sites have built huge graphs interlinking a significant fraction of the people living on the planet. At the same time, the development of the World Wide Web makes accessing this wealth of data possible from almost anywhere in the world.

The increased availability of data has resulted in an increase in data awareness. It is no longer acceptable for decision makers to trust their experience and "gut feeling" alone. Increasingly, one expects business decisions to be driven by data.

Finally, the tools for efficiently making sense of and extracting insights from huge data sets are starting to mature: one doesn't need to be an expert in distributed computing to analyze a large data set any more. Apache Spark, for instance, greatly eases writing distributed data analysis applications. The explosion of cloud infrastructure facilitates scaling computing needs to cope with variable data amounts.

Scala is a popular language for data science. By emphasizing immutability and functional constructs, Scala lends itself well to the construction of robust libraries for concurrency and big data analysis. A rich ecosystem of tools for data science has therefore developed around Scala, including libraries for accessing SQL and NoSQL databases, frameworks for building distributed applications like Apache Spark and libraries for linear algebra and numerical algorithms. We will explore this rich and growing ecosystem in the fourteen chapters of this book.

主站蜘蛛池模板: 阳春市| 新闻| 合山市| 闽清县| 华安县| 合阳县| 桓仁| 昭通市| 门头沟区| 临漳县| 丹棱县| 巴林左旗| 环江| 东平县| 霍州市| 汉川市| 墨脱县| 张掖市| 岳普湖县| 太保市| 宣恩县| 宁陕县| 邹平县| 当阳市| 南康市| 延川县| 四会市| 宁陵县| 尖扎县| 桐柏县| 达尔| 中阳县| 游戏| 嘉兴市| 锡林郭勒盟| 永济市| 新民市| 图们市| 青州市| 彰化县| 炉霍县|