- Frank Kane's Taming Big Data with Apache Spark and Python
- Frank Kane
- 275字
- 2021-07-02 21:12:15
Getting Started with Spark
Spark is one of the hottest technologies in big data analysis right now, and with good reason. If you work for, or you hope to work for, a company that has massive amounts of data to analyze, Spark offers a very fast and very easy way to analyze that data across an entire cluster of computers and spread that processing out. This is a very valuable skill to have right now.
My approach in this book is to start with some simple examples and work our way up to more complex ones. We'll have some fun along the way too. We will use movie ratings data and play around with similar movies and movie recommendations. I also found a social network of superheroes, if you can believe it; we can use this data to do things such as figure out who's the most popular superhero in the fictional superhero universe. Have you heard of the Kevin Bacon number, where everyone in Hollywood is supposedly connected to a Kevin Bacon to a certain extent? We can do the same thing with our superhero data and figure out the degrees of separation between any two superheroes in their fictional universe too. So, we'll have some fun along the way and use some real examples here and turn them into Spark problems. Using Apache Spark is easier than you might think and, with all the exercises and activities in this book, you'll get plenty of practice as we go along. I'll guide you through every line of code and every concept you need along the way. So let's get started and learn Apache Spark.
- SPSS數據挖掘與案例分析應用實踐
- 計算機網絡
- Java多線程編程實戰指南:設計模式篇(第2版)
- .NET之美:.NET關鍵技術深入解析
- Getting Started with React
- ASP.NET Core 5.0開發入門與實戰
- C/C++算法從菜鳥到達人
- Learning Bayesian Models with R
- Mastering Swift 2
- 用Python實現深度學習框架
- 軟件品質之完美管理:實戰經典
- MongoDB,Express,Angular,and Node.js Fundamentals
- Lighttpd源碼分析
- Learning jQuery(Fourth Edition)
- Orleans:構建高性能分布式Actor服務