- Apache Spark Quick Start Guide
- Shrey Mehrotra Akash Grade
- 278字
- 2021-07-02 13:39:55
Spark SQL
Spark SQL is where developers can work with structured and semi-structured data such as Hive tables, MySQL tables, Parquet files, AVRO files, JSON files, CSV files, and more. Another alternative to process structured data is using Hive. Hive processes structured data stored on HDFS using Hive Query Language (HQL). It internally uses MapReduce for its processing, and we shall see how Spark can deliver better performance than MapReduce. In the initial version of Spark, structured data used to be defined as schema RDD (another type of an RDD). When there is data along with the schema, SQL becomes the first choice of processing that data. Spark SQL is Spark's component that enables developers to process data with Structured Query Language (SQL).
Using Spark SQL, business logic can be easily written in SQL and HQL. This enables data warehouse engineers with a good knowledge of SQL to make use of Spark for their extract, transform, load (ETL) processing. Hive projects can easily be migrated on Spark using Spark SQL, without changing the Hive scripts.
Spark SQL is also the first choice for data analysis and data warehousing. Spark SQL enables the data analysts to write ad hoc queries for their exploratory analysis. Spark provides Spark SQL shell, where you can run the SQL-like queries and they get executed on Spark. Spark internally converts the code into a chain of RDD computations, while Hive converts the HQL job into a series of MapReduce jobs. Using Spark SQL, developers can also make use of caching (a Spark feature that enables data to be kept in memory), which can significantly increase the performance of their queries.
- 數據展現的藝術
- Splunk 7 Essentials(Third Edition)
- Mastercam 2017數控加工自動編程經典實例(第4版)
- Python Artificial Intelligence Projects for Beginners
- Getting Started with Containerization
- Security Automation with Ansible 2
- 機器人創新實訓教程
- 3D Printing for Architects with MakerBot
- 具比例時滯遞歸神經網絡的穩定性及其仿真與應用
- 我也能做CTO之程序員職業規劃
- 新編計算機組裝與維修
- 人工智能技術入門
- Learning ServiceNow
- Cloudera Hadoop大數據平臺實戰指南
- 傳感器原理及實用技術