官术网_书友最值得收藏!

Understanding the workings of the Catalyst Optimizer

So how does the optimizer work? The following figure shows the core components and how they are involved in a sequential optimization process:

First of all, it has to be understood that it doesn't matter if a DataFrame, the Dataset API, or SQL is used. They all result in the same Unresolved Logical Execution Plan (ULEP). A QueryPlan is unresolved if the column names haven't been verified and the column types haven't been looked up in the catalog. A Resolved Logical Execution Plan (RLEP) is then transformed multiple times, until it results in an Optimized Logical Execution Plan. LEPs don't contain a description of how something is computed, but only what has to be computed. The optimized LEP is transformed into multiple Physical Execution Plans (PEP) using so-called strategies. Finally, an optimal PEP is selected to be executed using a cost model by taking statistics about the Dataset to be queried into account. Note that the final execution takes place on RDD objects.

主站蜘蛛池模板: 顺平县| 胶州市| 永和县| 崇左市| 武清区| 元江| 灵石县| 弥渡县| 定结县| 卢湾区| 镇江市| 乌鲁木齐市| 湟源县| 孟州市| 贞丰县| 正蓝旗| 印江| 珲春市| 凤庆县| 定西市| 盐亭县| 枣强县| 巴南区| 东宁县| 昌黎县| 新巴尔虎右旗| 皋兰县| 海林市| 扎兰屯市| 怀化市| 肥东县| 柏乡县| 连平县| 榕江县| 肥西县| 威信县| 法库县| 衡东县| 洪泽县| 中阳县| 镇巴县|