官术网_书友最值得收藏!

From Hadoop MapReduce to Spark

With a growing amount of data, the single-machine tools were not able to satisfy the industry needs and thereby created a space for new data processing methods and tools, especially Hadoop MapReduce, which is based on an idea originally described in the Google paper, MapReduce: Simplified Data Processing on Large Clusters (https://research.google.com/archive/mapreduce.html). On the other hand, it is a generic framework without any explicit support or libraries to create machine learning workflows. Another limitation of classical MapReduce is that it performs many disk I/O operations during the computation instead of benefiting from machine memory.

As you have seen, there are several existing machine learning tools and distributed platforms, but none of them is an exact match for performing machine learning tasks with large data and distributed environment. All these claims open the doors for Apache Spark.

Enter the room, Apache Spark!

Created in 2010 at the UC Berkeley AMP Lab (Algorithms, Machines, People), the Apache Spark project was built with an eye for speed, ease of use, and advanced analytics. One key difference between Spark and other distributed frameworks such as Hadoop is that datasets can be cached in memory, which lends itself nicely to machine learning, given its iterative nature (more on this later!) and how data scientists are constantly accessing the same data many times over.

Spark can be run in a variety of ways, such as the following:

  • Local mode: This entails a single Java Virtual Machine (JVM) executed on a single host
  • Standalone Spark cluster: This entails multiple JVMs on multiple hosts
  • Via resource manager such as Yarn/Mesos: This application deployment is driven by a resource manager, which controls the allocation of nodes, application, distribution, and deployment
主站蜘蛛池模板: 石屏县| 亚东县| 江城| 德化县| 北票市| 商丘市| 尼木县| 布尔津县| 亚东县| 邵东县| 山西省| 大连市| 屏南县| 三明市| 辽阳县| 凤翔县| 德州市| 通河县| 秦安县| 淮滨县| 东港市| 汉川市| 女性| 涿州市| 利辛县| 南川市| 游戏| 瑞丽市| 万荣县| 黎平县| 拜城县| 贵德县| 扎鲁特旗| 新丰县| 芜湖市| 西青区| 贵州省| 宜都市| 东乌珠穆沁旗| 鹤庆县| 高州市|