官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 272字
  • 2021-07-02 18:23:43

Understanding Structured Streaming internals

To enable the Structured Streaming functionality, the planner polls for new data from the sources and incrementally executes the computation on it before writing it to the sink. In addition, any running aggregates required by your application are maintained as in-memory states backed by a Write-Ahead Log (WAL). The in-memory state data is generated and used across incremental executions. The fault tolerance requirements for such applications include the ability to recover and replay all data and metadata in the system. The planner writes offsets to a fault-tolerant WAL on persistent storage, such as HDFS, before execution as illustrated in the figure:.

In case the planner fails on the current incremental execution, the restarted planner reads from the WAL and re-executes the exact range of offsets required. Typically, sources such as Kafka are also fault-tolerant and generate the original transactions data, given the appropriate offsets recovered by the planner. The state data is usually maintained in a versioned, key-value map in Spark workers and is backed by a WAL on HDFS. The planner ensures that the correct version of the state is used to re-execute the transactions subsequent to a failure. Additionally, the sinks are idempotent by design, and can handle the re-executions without double commits of the output. Hence, an overall combination of offset tracking in WAL, state management, and fault-tolerant sources and sinks provide the end-to-end exactly-once guarantees.

We can list the Physical Plan for our example of Structured Streaming using the explain method, as shown:

scala> spark.streams.active(0).explain 

We will explain the preceding output in more detail in Chapter 11Tuning Spark SQL Components for Performance.

主站蜘蛛池模板: 华阴市| 滦南县| 读书| 招远市| 百色市| 台山市| 蒲城县| 孝感市| 建阳市| 柳林县| 堆龙德庆县| 阜新市| 布拖县| 沙洋县| 大足县| 收藏| 瓦房店市| 上蔡县| 姚安县| 东乌珠穆沁旗| 吴忠市| 海兴县| 东乡族自治县| 湘阴县| 阜宁县| 绥宁县| 微山县| 射阳县| 平湖市| 屯门区| 基隆市| 都兰县| 洛浦县| 巴东县| 呼伦贝尔市| 阳曲县| 闻喜县| 信丰县| 济阳县| 惠水县| 秭归县|