- Python:Data Analytics and Visualization
- Phuong Vo.T.H Martin Czygan Ashish Kumar Kirthi Raman
- 291字
- 2021-07-09 18:51:45
Grouping data
One typical workflow during data exploration looks as follows:
- You find a criterion that you want to use to group your data. Maybe you have GDP data for every country along with the continent and you would like to ask questions about the continents. These questions usually lead to some function applications- you might want to compute the mean GDP per continent. Finally, you want to store this data for further processing in a new data structure.
- We use a simpler example here. Imagine some fictional weather data about the number of sunny hours per day and city:
>>> df date city value 0 2000-01-03 London 6 1 2000-01-04 London 3 2 2000-01-05 London 4 3 2000-01-03 Mexico 3 4 2000-01-04 Mexico 9 5 2000-01-05 Mexico 8 6 2000-01-03 Mumbai 12 7 2000-01-04 Mumbai 9 8 2000-01-05 Mumbai 8 9 2000-01-03 Tokyo 5 10 2000-01-04 Tokyo 5 11 2000-01-05 Tokyo 6
- The
groups
attributes return a dictionary containing the unique groups and the corresponding values as axis labels:>>> df.groupby("city").groups {'London': [0, 1, 2], 'Mexico': [3, 4, 5], 'Mumbai': [6, 7, 8], 'Tokyo': [9, 10, 11]}
- Although the result of a
groupby
is a GroupBy object, not a DataFrame, we can use the usual indexing notation to refer to columns:>>> grouped = df.groupby(["city", "value"]) >>> grouped["value"].max() city London 6 Mexico 9 Mumbai 12 Tokyo 6 Name: value, dtype: int64 >>> grouped["value"].sum() city London 13 Mexico 20 Mumbai 29 Tokyo 16 Name: value, dtype: int64
- We see that, according to our data set, Mumbai seems to be a sunny city. An alternative – and more verbose – way to achieve the above would be:
>>> df['value'].groupby(df['city']).sum() city London 13 Mexico 20 Mumbai 29 Tokyo 16 Name: value, dtype: int64
推薦閱讀
- Div+CSS 3.0網頁布局案例精粹
- Mastercam 2017數控加工自動編程經典實例(第4版)
- Python Artificial Intelligence Projects for Beginners
- Drupal 7 Multilingual Sites
- 反饋系統:多學科視角(原書第2版)
- 大數據平臺異常檢測分析系統的若干關鍵技術研究
- 精通特征工程
- 工業機器人操作與編程
- Splunk Operational Intelligence Cookbook
- 網絡安全與防護
- 大數據驅動的機械裝備智能運維理論及應用
- 筆記本電腦維修90個精選實例
- R Data Analysis Projects
- 水晶石影視動畫精粹:After Effects & Nuke 影視后期合成
- Mastering Text Mining with R