書名： Apache Spark Graph Processing
作者名： Rindra Ramamonjison
本章字數： 205字
更新時間： 2021-07-16 20:03:53

Chapter 1. Getting Started with Spark and GraphX

Apache Spark is a cluster-computing platform for the processing of large distributed datasets. Data processing in Spark is both fast and easy, thanks to its optimized parallel computation engine and its flexible and unified API. The core abstraction in Spark is based on the concept of Resilient Distributed Dataset (RDD). By extending the MapReduce framework, Spark's Core API makes analytics jobs easier to write. On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning. In particular, GraphX is the library to perform graph-parallel processing in Spark.

This chapter will introduce you to Spark and GraphX by building a social network and exploring the links between people in the network. In addition, you will learn to use the Scala Build Tool (SBT) to build and run a Spark program. By the end of this chapter, you will know how to:

Install Spark successfully on your computer
Experiment with the Spark shell and review Spark's data abstractions
Create a graph and explore the links using base RDD and graph operations
Build and submit a standalone Spark application with SBT

官术网_书友最值得收藏!

Apache Spark Graph Processing

Chapter 1. Getting Started with Spark and GraphX