- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 254字
- 2021-07-02 18:55:32
Summary
This chapter started by explaining the SparkSession object and file I/O methods. It then showed that Spark- and HDFS-based data could be manipulated as both, DataFrames with SQL-like methods and Datasets as strongly typed version of Dataframes, and with Spark SQL by registering temporary tables. It has been shown that schema can be inferred using the DataSource API or explicitly defined using StructType on DataFrames or case classes on Datasets.
Next, user-defined functions were introduced to show that the functionality of Spark SQL could be extended by creating new functions to suit your needs, registering them as UDFs, and then calling them in SQL to process data. This lays the foundation for most of the subsequent chapters as the new DataFrame and Dataset API of Apache Spark is the way to go and RDDs are only used as fallback.
In the coming chapters, we'll discover why these new APIs are much faster than RDDs by taking a look at some internals of Apache SparkSQL in order to understand why Apache SparkSQL provides such dramatic performance improvements over the RDD API. This knowledge is important in order to write efficient SQL queries or data transformations on top of the DataFrame or Dataset relational API. So, it is of utmost importance that we take a look at the Apache Spark optimizer called Catalyst, which actually takes your high-level program and transforms it into efficient calls on top of the RDD API and, in later chapters, Tungsten, which is integral to the study of Apache Spark.
- 手機安全和可信應用開發指南:TrustZone與OP-TEE技術詳解
- 數據庫系統原理及MySQL應用教程(第2版)
- ExtGWT Rich Internet Application Cookbook
- 復雜軟件設計之道:領域驅動設計全面解析與實戰
- LabVIEW入門與實戰開發100例
- Web交互界面設計與制作(微課版)
- Java編程指南:基礎知識、類庫應用及案例設計
- HTML5游戲開發案例教程
- 零基礎學Python網絡爬蟲案例實戰全流程詳解(高級進階篇)
- Mastering JBoss Enterprise Application Platform 7
- Visual FoxPro程序設計習題集及實驗指導(第四版)
- 編寫高質量代碼:改善Objective-C程序的61個建議
- 計算機應用基礎教程(Windows 7+Office 2010)
- Applied Deep Learning with Python
- Java EE 7 Development with WildFly