官术网_书友最值得收藏!

The Apache Spark Ecosystem

Apache Spark (http://spark.apache.org/) is an open source, fast cluster-computing platform. It was originally created by AMPLab at the University of California, Berkeley. Its source code was later donated to the Apache Software Foundation (https://www.apache.org/). Spark comes with a very fast computation speed because data is loaded into distributed memory (RAM) across a cluster of machines. Not only can data be quickly transformed, but also cached on demand for a variety of use cases. Compared to Hadoop MapReduce, it runs programs up to 100 times faster when the data fits in memory, or 10 times faster on disk. Spark provides support for four programming languages: Java, Scala, Python, and R. This book covers the Spark APIs (and deep learning frameworks) for Scala (https://www.scala-lang.org/) and Python (https://www.python.org/) only.

This chapter will cover the following topics:

  • Apache Spark fundamentals
  • Getting Spark
  • Resilient Distributed Dataset (RDD) programming
  • Spark SQL, Datasets, and DataFrames
  • Spark Streaming
  • Cluster mode using a different manager
主站蜘蛛池模板: 大竹县| 通化县| 靖江市| 嫩江县| 保山市| 吴忠市| 西和县| 洞口县| 都兰县| 应城市| 达拉特旗| 宜川县| 镇原县| 师宗县| 东台市| 晋宁县| 浮梁县| 克山县| 大理市| 沙田区| 澄迈县| 江西省| 霸州市| 邓州市| 松潘县| 科技| 贡山| 洛川县| 鹿邑县| 宁明县| 温州市| 潞城市| 吉林省| 河曲县| 邻水| 北辰区| 三河市| 博乐市| 潼南县| 临海市| 兴隆县|