官术网_书友最值得收藏!

Real-time processing

While batch processing frameworks are good for most data warehousing use cases, there is a critical need for processing the data and generating actionable insight as soon as the data is available. For example, in a credit card fraud detection system, the alert should be generated as soon as the first instance of logged malicious activity. There is no value if the actionable insight (denying the transaction) is available as a result of the end-of-month batch process. The idea of a real-time processing framework is to reduce latency between event time and processing time. In an ideal system, the expectation would be zero differential between the event time and the processing time. However, the time difference is a function of the data source input, execution engine, network bandwidth, and hardware. Real-time processing frameworks achieve low latency with minimal I/O by relying on in-memory computing in a distributed manner. Some of the most popular real-time processing frameworks are:

  • Apache Spark: This is a distributed execution engine that relies on in-memory processing based on fault tolerant data abstractions named RDDs (Resilient Distributed Datasets).
  • Apache Storm: This is a framework for distributed real-time computation. Storm applications are designed to easily process unbounded streams, which generate event data at a very high velocity.
  • Apache Flink: This is a framework for efficient, distributed, high volume data processing. The key feature of Flink is automatic program optimization. Flink provides native support for massively iterative, compute intensive algorithms.

As the ecosystem is evolving, there are many more frameworks available for batch and real-time processing. Going back to the machine intelligence evolution cycle (Perceive, Process, Persist, Perform), we are going to leverage these frameworks to create programs that work on Big Data, take an algorithmic approach to filter relevant data, generate models based on the patterns within the data, and derive actionable insight and predictions that ultimately lead to value from the data assets.

主站蜘蛛池模板: 石嘴山市| 米林县| 专栏| 栾川县| 梓潼县| 封开县| 韶山市| 丰镇市| 宜兰县| 合山市| 石楼县| 阳春市| 晋城| 丹寨县| 恭城| 北票市| 武山县| 青州市| 毕节市| 宜兰县| 湟中县| 柏乡县| 广德县| 防城港市| 神木县| 衡水市| 自贡市| 达州市| 陆河县| 翁源县| 荥经县| 临湘市| 长葛市| 章丘市| 长汀县| 东阳市| 车致| 海兴县| 佛坪县| 横峰县| 西青区|