官术网_书友最值得收藏!

Spark Streaming

Spark Streaming is a package that is used to process a stream of data in real time. There can be many different types of a real-time stream of data; for example, an e-commerce website recording page visits in real time, credit card transactions, a taxi provider app sending information about trips and location information of drivers and passengers, and more. In a nutshell, all of these applications are hosted on multiple web servers that generate event logs in real time.

Spark Streaming makes use of RDD and defines some more APIs to process the data stream in real time. As Spark Streaming makes use of RDD and its APIs, it is easy for developers to learn and execute the use cases without learning a whole new technology stack.

Spark 2.x introduced structured streaming, which makes use of DataFrames rather than RDD to process the data stream. Using DataFrames as its computation abstraction brings all the benefits of the DataFrame API to stream processing. We shall discuss the benefits of DataFrames over RDD in coming chapters.

Spark Streaming has excellent integration with some of the most popular data messaging queues, such as Apache Flume and Kafka. It can be easily plugged into these queues to handle a massive amount of data streams.

主站蜘蛛池模板: 灌南县| 平南县| 海安县| 霍州市| 凌海市| 潢川县| 浪卡子县| 镇江市| 仁寿县| 荔波县| 喀什市| 承德市| 太仓市| 定陶县| 克东县| 邹城市| 利川市| 峨眉山市| 郁南县| 新晃| 岚皋县| 叶城县| 德阳市| 新兴县| 将乐县| 永济市| 堆龙德庆县| 维西| 清镇市| 寿阳县| 灌阳县| 任丘市| 贵溪市| 若羌县| 家居| 张家界市| 黄浦区| 娄底市| 昌图县| 佛山市| 亚东县|