- Machine Learning for Developers
- Rodolfo Bonnin
- 287字
- 2021-07-02 15:46:51
The ETL process
The previous stages in the big data processing field evolved over several decades under the name of data mining, and then adopted the popular name of big data.
One of the best outcomes of these disciplines is the specification of the Extraction, Transform, Load (ETL) process.
This process starts with a mix of many data sources from business systems, then moves to a system that transforms the data into a readable state, and then finishes by generating a data mart with very structured and documented data types.
For the sake of applying this concept, we will mix the elements of this process with the final outcome of a structured dataset, which includes in its final form an additional label column (in the case of supervised learning problems).
This process is depicted in the following diagram:

The diagram illustrates the first stages of the data pipeline, starting with all the organization's data, whether it is commercial transactions, IoT device raw values, or other valuable data sources' information elements, which are commonly in very different types and compositions. The ETL process is in charge of gathering the raw information from them using different software filters, applying the necessary transforms to arrange the data in a useful manner, and finally, presenting the data in tabular format (we can think of this as a single database table with a last feature or result column, or a big CSV file with consolidated data). The final result can be conveniently used by the following processes without practically thinking of the many quirks of data formatting, because they have been standardized into a very clear table structure.
- UI圖標創意設計
- iOS面試一戰到底
- Learning C# by Developing Games with Unity 2020
- Java系統分析與架構設計
- PyTorch自動駕駛視覺感知算法實戰
- Mastering phpMyAdmin 3.4 for Effective MySQL Management
- Java EE框架整合開發入門到實戰:Spring+Spring MVC+MyBatis(微課版)
- RTC程序設計:實時音視頻權威指南
- 64位匯編語言的編程藝術
- FLL+WRO樂高機器人競賽教程:機械、巡線與PID
- UVM實戰
- RESTful Java Web Services(Second Edition)
- Learning Hadoop 2
- Hands-On GUI Programming with C++ and Qt5
- 深度學習程序設計實戰