官术网_书友最值得收藏!

Getting Started with Spark

Spark is one of the hottest technologies in big data analysis right now, and with good reason. If you work for, or you hope to work for, a company that has massive amounts of data to analyze, Spark offers a very fast and very easy way to analyze that data across an entire cluster of computers and spread that processing out. This is a very valuable skill to have right now.

My approach in this book is to start with some simple examples and work our way up to more complex ones. We'll have some fun along the way too. We will use movie ratings data and play around with similar movies and movie recommendations. I also found a social network of superheroes, if you can believe it; we can use this data to do things such as figure out who's the most popular superhero in the fictional superhero universe. Have you heard of the Kevin Bacon number, where everyone in Hollywood is supposedly connected to a Kevin Bacon to a certain extent? We can do the same thing with our superhero data and figure out the degrees of separation between any two superheroes in their fictional universe too. So, we'll have some fun along the way and use some real examples here and turn them into Spark problems. Using Apache Spark is easier than you might think and, with all the exercises and activities in this book, you'll get plenty of practice as we go along. I'll guide you through every line of code and every concept you need along the way. So let's get started and learn Apache Spark.

主站蜘蛛池模板: 海南省| 自治县| 乌审旗| 嘉定区| 独山县| 台南市| 喀什市| 汪清县| 肇庆市| 湖北省| 安丘市| 瑞昌市| 紫金县| 贺州市| 鹿邑县| 扬中市| 朔州市| 巴马| 霞浦县| 曲阜市| 巴林左旗| 花莲县| 津市市| 鄂伦春自治旗| 西丰县| 黔南| 西贡区| 丽江市| 建平县| 龙岩市| 屏山县| 新宾| 安达市| 玉田县| 青岛市| 通山县| 虞城县| 石屏县| 秦安县| 同德县| 金塔县|