官术网_书友最值得收藏!

Summary

In this chapter we have looked at ways to manipulate data frames, from cleaning and filtering, to grouping, aggregation, and reshaping. Pandas makes a lot of the common operations very easy and more complex operations, such as pivoting or grouping by multiple attributes, can often be expressed as one-liners as well. Cleaning and preparing data is an essential part of data exploration and analysis.

The next chapter explains a brief of machine learning algorithms that is applying data analysis result to make decisions or build helpful products.

Practice exercises

Exercise 1: Cleaning: In the section about filtering, we used the Europe Brent Crude Oil Spot Price, which can be found as an Excel document on the internet. Take this Excel spreadsheet and try to convert it into a CSV document that is ready to be imported with Pandas.

Hint: There are many ways to do this. We used a small tool called xls2csv.py and we were able to load the resulting CSV file with a helper method:

import datetime
import pandas as pd
def convert_date(s):
    parts = s.replace("(", "").replace(")", "").split(",")
        if len(parts) < 6:
        return datetime.date(1970, 1, 1)
        return datetime.datetime(*[int(p) for p in parts])
        df = pd.read_csv("RBRTEd.csv", sep=',', names=["date", "price"], converters={"date": convert_date}).dropna()

Take a data set that is important for your work – or if you do not have any at hand, a data set that interests you and that is available online. Ask one or two questions about the data in advance. Then use cleaning, filtering, grouping, and plotting techniques to answer your question.

主站蜘蛛池模板: 壤塘县| 炎陵县| 和田市| 合山市| 犍为县| 屏南县| 特克斯县| 普定县| 凉城县| 天等县| 唐山市| 许昌县| 克拉玛依市| 易门县| 淮滨县| 疏勒县| 西城区| 灵宝市| 中山市| 旺苍县| 松桃| 于都县| 荆门市| 呼和浩特市| 开化县| 边坝县| 苏尼特左旗| 潜山县| 双桥区| 达孜县| 南木林县| 昌黎县| 黄平县| 辰溪县| 惠东县| 台州市| 青河县| 娄底市| 衡水市| 甘德县| 鲁山县|