官术网_书友最值得收藏!

Chapter 1. Getting Started with Spark and GraphX

Apache Spark is a cluster-computing platform for the processing of large distributed datasets. Data processing in Spark is both fast and easy, thanks to its optimized parallel computation engine and its flexible and unified API. The core abstraction in Spark is based on the concept of Resilient Distributed Dataset (RDD). By extending the MapReduce framework, Spark's Core API makes analytics jobs easier to write. On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning. In particular, GraphX is the library to perform graph-parallel processing in Spark.

This chapter will introduce you to Spark and GraphX by building a social network and exploring the links between people in the network. In addition, you will learn to use the Scala Build Tool (SBT) to build and run a Spark program. By the end of this chapter, you will know how to:

  • Install Spark successfully on your computer
  • Experiment with the Spark shell and review Spark's data abstractions
  • Create a graph and explore the links using base RDD and graph operations
  • Build and submit a standalone Spark application with SBT
主站蜘蛛池模板: 成都市| 加查县| 三亚市| 平阴县| 新宾| 嫩江县| 蓝山县| 原平市| 武夷山市| 九龙城区| 连江县| 阿图什市| 彭泽县| 宜宾县| 兴海县| 葫芦岛市| 封丘县| 陇西县| 临西县| 康乐县| 黄龙县| 华宁县| 蓬溪县| 河池市| 龙口市| 溧水县| 屏东市| 正阳县| 洞口县| 安远县| 德惠市| 工布江达县| 乌兰察布市| 五河县| 胶州市| 富川| 镇巴县| 芜湖市| 堆龙德庆县| 南丰县| 临沧市|