官术网_书友最值得收藏!

Preface

Apache Spark has captured the imagination of the analytics and big data developers, rightfully so. In a nutshell, Spark enables distributed computing at scale in the lab or in production. Until now, the collect-store-transform pipeline was distinct from the data science Reason-Model pipeline , which was again distinct from the deployment of the analytics and machine learning models. Now with Spark and technologies such as Kafka, we can seamlessly span the data management and data science pipelines. Moreover, now we can build data science models on larger datasets and need not just sample data. And whatever models we build can be deployed into production (with added work from engineering on the “ilities”, of course). It is our hope that this book will enable a data engineer to get familiar with the fundamentals of the Spark platform as well as provide hands-on experience of some of the advanced capabilities.

主站蜘蛛池模板: 芜湖市| 宕昌县| 南雄市| 莱西市| 东乡| 常德市| 神池县| 十堰市| 洛南县| 赣州市| 洛南县| 呼图壁县| 桑植县| 天柱县| 恭城| 十堰市| 南溪县| 疏勒县| 望奎县| 潜山县| 新邵县| 东源县| 鄂托克前旗| 宣汉县| 钟祥市| 平江县| 五大连池市| 迁西县| 呼玛县| 大名县| 大竹县| 宜春市| 和林格尔县| 庆阳市| 平原县| 巩留县| 乐山市| 开原市| 鄂伦春自治旗| 伊金霍洛旗| 永善县|