官术网_书友最值得收藏!

Batch processing

Traditionally, the data processing pipeline within data warehousing systems consisted of Extracting, Transforming, and Loading the data for analysis and actions (ETL). With the new paradigm of file-based distributed computing, there has been a shift in the ETL process sequence. Now the data is Extracted, Loaded, and Transformed repetitively for analysis (ELTTT) a number of times:

In batch processing, the data is collected from various sources in the staging areas and loaded and transformed with defined frequencies and schedules. In most use cases with batch processing, there is no critical need to process the data in real time or in near real time. As an example, the monthly report on a student's attendance data will be generated by a process (batch) at the end of a calendar month. This process will extract the data from source systems, load it, and transform it for various views and reports. One of the most popular batch processing frameworks is Apache Hadoop. It is a highly scalable, distributed/parallel processing framework. The primary building block of Hadoop is the Hadoop Distributed File System.

As the name suggests, this is a wrapper filesystem which stores the data (structured/unstructured/semi-structured) in a distributed manner on data nodes within Hadoop. The processing that is applied on the data (instead of the data that is processed) is sent to the data on various nodes. Once the compute is performed by an inpidual node, the results are consolidated by the master process. In this paradigm of data-compute localization, Hadoop relies heavily on intermediate I/O operations on hard drive disks. As a result, extremely large volumes of data can be processed by Hadoop in a reliable manner at the cost of processing time. This framework is very suitable for extracting value from Big Data in batch mode.

主站蜘蛛池模板: 洮南市| 龙口市| 闸北区| 忻州市| 扎赉特旗| 长沙市| 青浦区| 襄垣县| 泗洪县| 万安县| 桃园市| 灵川县| 育儿| 垫江县| 祁阳县| 常德市| 兴宁市| 赣榆县| 定远县| 兰考县| 望城县| 灵台县| 安康市| 高要市| 夏河县| 临海市| 乌拉特中旗| 阳西县| 沾化县| 龙岩市| 遂昌县| 吉首市| 郸城县| 富蕴县| 噶尔县| 商河县| 瑞金市| 淮阳县| 定西市| 万源市| 信宜市|