- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 177字
- 2021-07-02 18:55:23
Spark SQL
From Spark version 1.3, data frames have been introduced in Apache Spark so that Spark data can be processed in a tabular form and tabular functions (such as select, filter, and groupBy) can be used to process data. The Spark SQL module integrates with Parquet and JSON formats to allow data to be stored in formats that better represent the data. This also offers more options to integrate with external systems.
The idea of integrating Apache Spark into the Hadoop Hive big data database can also be introduced. Hive context-based Spark applications can be used to manipulate Hive-based table data. This brings Spark's fast in-memory distributed processing to Hive's big data storage capabilities. It effectively lets Hive use Spark as a processing engine.
Additionally, there is an abundance of additional connectors to access NoSQL databases outside the Hadoop ecosystem directly from Apache Spark. In Chapter 2, Apache Spark SQL, we will see how the Cloudant connector can be used to access a remote ApacheCouchDB NoSQL database and issue SQL statements against JSON-based NoSQL document collections.
- 零基礎學Visual C++第3版
- C#程序設計實訓指導書
- 垃圾回收的算法與實現
- 神經網絡編程實戰:Java語言實現(原書第2版)
- RTC程序設計:實時音視頻權威指南
- jQuery從入門到精通 (軟件開發視頻大講堂)
- YARN Essentials
- Lua程序設計(第4版)
- TypeScript實戰指南
- Java設計模式及實踐
- WebRTC技術詳解:從0到1構建多人視頻會議系統
- 數據結構案例教程(C/C++版)
- Asynchronous Android Programming(Second Edition)
- 西門子S7-200 SMART PLC編程從入門到實踐
- Creating Stunning Dashboards with QlikView