官术网_书友最值得收藏!

Summary

Data processing and wrangling is the initial, and a very important, part of the data science pipeline. It is generally helpful if people preparing data have some domain knowledge about the data, since that will help them stop at the right processing point and use their intuition to build the pipeline better and more quickly. Data processing also requires coming up with innovative solutions and hacks.

In this chapter, you learned how to structure large datasets by arranging them in a tabular form. Then, we got this tabular data into pandas and distributed it between the right columns. Once we were sure that our data was arranged correctly, we combined it with other data sources. We also got rid of duplicates and needless columns, and finally, dealt with missing data. After performing these steps, our data was made ready for analysis and could be put into a data science pipeline directly.

In the next chapter, we will deepen our understanding of pandas and talk about reshaping and analyzing DataFrames for better visualizations and summarizing data. We will also see how to directly solve generic business-critical problems efficiently.

主站蜘蛛池模板: 东宁县| 南宁市| 和田市| 岳阳市| 金秀| 边坝县| 顺义区| 南岸区| 孙吴县| 鞍山市| 三河市| 屯昌县| 化德县| 横峰县| 松溪县| 墨竹工卡县| 正镶白旗| 郑州市| 宁海县| 南京市| 萝北县| 应城市| 稷山县| 互助| 开鲁县| 东阿县| 樟树市| 沐川县| 光山县| 宜章县| 阿合奇县| 雷州市| 保靖县| 颍上县| 芜湖市| 黄浦区| 吴忠市| 越西县| 辽宁省| 隆子县| 崇阳县|