官术网_书友最值得收藏!

Spark SQL

From Spark version 1.3, data frames have been introduced in Apache Spark so that Spark data can be processed in a tabular form and tabular functions (such as select, filter, and groupBy) can be used to process data. The Spark SQL module integrates with Parquet and JSON formats to allow data to be stored in formats that better represent the data. This also offers more options to integrate with external systems.

The idea of integrating Apache Spark into the Hadoop Hive big data database can also be introduced. Hive context-based Spark applications can be used to manipulate Hive-based table data. This brings Spark's fast in-memory distributed processing to Hive's big data storage capabilities. It effectively lets Hive use Spark as a processing engine.

Additionally, there is an abundance of additional connectors to access NoSQL databases outside the Hadoop ecosystem directly from Apache Spark. In Chapter 2, Apache Spark SQL, we will see how the Cloudant connector can be used to access a remote ApacheCouchDB NoSQL database and issue SQL statements against JSON-based NoSQL document collections.

主站蜘蛛池模板: 平武县| 五寨县| 措勤县| 灌南县| 扎鲁特旗| 临沭县| 湾仔区| 台中县| 烟台市| 中牟县| 博客| 岑巩县| 天门市| 迁西县| 张北县| 蚌埠市| 潼关县| 敦化市| 顺义区| 长宁区| 芮城县| 青川县| 阿尔山市| 车险| 黎川县| 会昌县| 旅游| 诏安县| 蕉岭县| 宜章县| 沅陵县| 阿合奇县| 宝山区| 杂多县| 凌云县| 太谷县| 灌云县| 靖州| 申扎县| 迭部县| 招远市|