官术网_书友最值得收藏!

Munging and wrangling

The terms munging and wrangling are buzzwords or jargon meant to describe one's efforts to affect the format of data, recordset, or file in some way in an effort to prepare the data for continued or otherwise processing and/or evaluations.

With data development, you are most likely familiar with the idea of Extract, Transform, and Load (ETL). In somewhat the same way, a data developer may mung or wrangle data during the transformation steps within an ETL process.

Common munging and wrangling may include removing punctuation or HTML tags, data parsing, filtering, all sorts of transforming, mapping, and tying together systems and interfaces that were not specifically designed to interoperate. Munging can also describe the processing or filtering of raw data into another form, allowing for more convenient consumption of the data elsewhere.

Munging and wrangling might be performed multiple times within a data science process and/or at different steps in the evolving process. Sometimes, data scientists use munging to include various data visualization, data aggregation, training a statistical model, as well as much other potential work. To this point, munging and wrangling may follow a flow beginning with extracting the data in a raw form, performing the munging using various logic, and lastly, placing the resulting content into a structure for use.

Although there are many valid options for munging and wrangling data, preprocessing and manipulation, a tool that is popular with many data scientists today is a product named Trifecta, which claims that it is the number one (data) wrangling solution in many industries.

Trifecta can be downloaded for your personal evaluation from https://www.trifacta.com/. Check it out!
主站蜘蛛池模板: 定日县| 天等县| 长宁县| 兴义市| 东明县| 凌海市| 迭部县| 章丘市| 陇南市| 凌云县| 嘉善县| 习水县| 荥阳市| 安乡县| 论坛| 东台市| 新闻| 南通市| 开阳县| 昌平区| 隆化县| 望奎县| 郯城县| 独山县| 板桥市| 嘉荫县| 如皋市| 清河县| 社会| 吉木萨尔县| 禄劝| 徐州市| 女性| 梓潼县| 剑阁县| 上犹县| 望奎县| 德令哈市| 靖西县| 临泉县| 乌兰察布市|