- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 113字
- 2021-07-09 21:07:56
Performance improvements in Spark ML over Spark MLlib
Spark 2.0 uses Tungsten Engine, which is built using ideas of modern compilers and MPP databases. It emits optimized bytecode at runtime, which collapses the query into a single function. Hence, there is no need for virtual function calls. It also uses CPU registers to store intermediate data. This technique has been called whole stage code generation.

Reference : https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.htmlSource: https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html
The upcoming table and graph show single function improvements between Spark 1.6 and Spark 2.0:

Chart comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0

Table comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0.
推薦閱讀
- 數據運營之路:掘金數據化時代
- Hadoop Real-World Solutions Cookbook(Second Edition)
- AWS Certified SysOps Administrator:Associate Guide
- Visual C++編程全能詞典
- C語言開發技術詳解
- Spark大數據技術與應用
- ESP8266 Home Automation Projects
- 內模控制及其應用
- Hadoop應用開發基礎
- 基于神經網絡的監督和半監督學習方法與遙感圖像智能解譯
- FPGA/CPLD應用技術(Verilog語言版)
- ESP8266 Robotics Projects
- 在實戰中成長:C++開發之路
- 大數據案例精析
- Python文本分析