- Artificial Intelligence for Big Data
- Anand Deshpande Manish Kumar
- 319字
- 2021-06-25 21:57:06
Real-time processing
While batch processing frameworks are good for most data warehousing use cases, there is a critical need for processing the data and generating actionable insight as soon as the data is available. For example, in a credit card fraud detection system, the alert should be generated as soon as the first instance of logged malicious activity. There is no value if the actionable insight (denying the transaction) is available as a result of the end-of-month batch process. The idea of a real-time processing framework is to reduce latency between event time and processing time. In an ideal system, the expectation would be zero differential between the event time and the processing time. However, the time difference is a function of the data source input, execution engine, network bandwidth, and hardware. Real-time processing frameworks achieve low latency with minimal I/O by relying on in-memory computing in a distributed manner. Some of the most popular real-time processing frameworks are:
- Apache Spark: This is a distributed execution engine that relies on in-memory processing based on fault tolerant data abstractions named RDDs (Resilient Distributed Datasets).
- Apache Storm: This is a framework for distributed real-time computation. Storm applications are designed to easily process unbounded streams, which generate event data at a very high velocity.
- Apache Flink: This is a framework for efficient, distributed, high volume data processing. The key feature of Flink is automatic program optimization. Flink provides native support for massively iterative, compute intensive algorithms.
As the ecosystem is evolving, there are many more frameworks available for batch and real-time processing. Going back to the machine intelligence evolution cycle (Perceive, Process, Persist, Perform), we are going to leverage these frameworks to create programs that work on Big Data, take an algorithmic approach to filter relevant data, generate models based on the patterns within the data, and derive actionable insight and predictions that ultimately lead to value from the data assets.
- SQL入門經(jīng)典(第5版)
- 醫(yī)療大數(shù)據(jù)挖掘與可視化
- UDK iOS Game Development Beginner's Guide
- 數(shù)據(jù)要素五論:信息、權(quán)屬、價值、安全、交易
- 深入淺出MySQL:數(shù)據(jù)庫開發(fā)、優(yōu)化與管理維護(hù)(第2版)
- Mockito Cookbook
- 數(shù)亦有道:Python數(shù)據(jù)科學(xué)指南
- gnuplot Cookbook
- 辦公應(yīng)用與計(jì)算思維案例教程
- TextMate How-to
- 視覺大數(shù)據(jù)智能分析算法實(shí)戰(zhàn)
- Spring MVC Beginner’s Guide
- R Machine Learning Essentials
- 企業(yè)級大數(shù)據(jù)項(xiàng)目實(shí)戰(zhàn):用戶搜索行為分析系統(tǒng)從0到1
- 云工作時代:科技進(jìn)化必將帶來的新工作方式