- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 170字
- 2021-07-02 18:55:32
Understanding the workings of the Catalyst Optimizer
So how does the optimizer work? The following figure shows the core components and how they are involved in a sequential optimization process:

First of all, it has to be understood that it doesn't matter if a DataFrame, the Dataset API, or SQL is used. They all result in the same Unresolved Logical Execution Plan (ULEP). A QueryPlan is unresolved if the column names haven't been verified and the column types haven't been looked up in the catalog. A Resolved Logical Execution Plan (RLEP) is then transformed multiple times, until it results in an Optimized Logical Execution Plan. LEPs don't contain a description of how something is computed, but only what has to be computed. The optimized LEP is transformed into multiple Physical Execution Plans (PEP) using so-called strategies. Finally, an optimal PEP is selected to be executed using a cost model by taking statistics about the Dataset to be queried into account. Note that the final execution takes place on RDD objects.
- Learning Scala Programming
- Node.js 10實戰
- Spring 5.0 By Example
- 深入淺出Java虛擬機:JVM原理與實戰
- Bulma必知必會
- Learn Scala Programming
- Julia Cookbook
- 零基礎學Python網絡爬蟲案例實戰全流程詳解(高級進階篇)
- C/C++程序員面試指南
- Python程序設計與算法基礎教程(第2版)(微課版)
- 動手打造深度學習框架
- Python 3快速入門與實戰
- 微信小程序開發邊做邊學(微課視頻版)
- MySQL從入門到精通
- Server Side development with Node.js and Koa.js Quick Start Guide