官术网_书友最值得收藏!

Grouping data

One typical workflow during data exploration looks as follows:

  • You find a criterion that you want to use to group your data. Maybe you have GDP data for every country along with the continent and you would like to ask questions about the continents. These questions usually lead to some function applications- you might want to compute the mean GDP per continent. Finally, you want to store this data for further processing in a new data structure.
  • We use a simpler example here. Imagine some fictional weather data about the number of sunny hours per day and city:
    >>> df
     date city value
    0 2000-01-03 London 6
    1 2000-01-04 London 3
    2 2000-01-05 London 4
    3 2000-01-03 Mexico 3
    4 2000-01-04 Mexico 9
    5 2000-01-05 Mexico 8
    6 2000-01-03 Mumbai 12
    7 2000-01-04 Mumbai 9
    8 2000-01-05 Mumbai 8
    9 2000-01-03 Tokyo 5
    10 2000-01-04 Tokyo 5
    11 2000-01-05 Tokyo 6
    
  • The groups attributes return a dictionary containing the unique groups and the corresponding values as axis labels:
    >>> df.groupby("city").groups
    {'London': [0, 1, 2],
    'Mexico': [3, 4, 5],
    'Mumbai': [6, 7, 8],
    'Tokyo': [9, 10, 11]}
    
  • Although the result of a groupby is a GroupBy object, not a DataFrame, we can use the usual indexing notation to refer to columns:
    >>> grouped = df.groupby(["city", "value"])
    >>> grouped["value"].max()
    city
    London 6
    Mexico 9
    Mumbai 12
    Tokyo 6
    Name: value, dtype: int64
    >>> grouped["value"].sum()
    city
    London 13
    Mexico 20
    Mumbai 29
    Tokyo 16
    Name: value, dtype: int64
    
  • We see that, according to our data set, Mumbai seems to be a sunny city. An alternative – and more verbose – way to achieve the above would be:
    >>> df['value'].groupby(df['city']).sum()
    city
    London 13
    Mexico 20
    Mumbai 29
    Tokyo 16
    Name: value, dtype: 
    int64
    
主站蜘蛛池模板: 宁陕县| 遵义市| 包头市| 延吉市| 绿春县| 宣武区| 阿巴嘎旗| 淮安市| 阿巴嘎旗| 三原县| 镇坪县| 双桥区| 玉屏| 稷山县| 古交市| 桃园市| 云安县| 岫岩| 和静县| 鹤峰县| 枣强县| 卓尼县| 淄博市| 浙江省| 满城县| 武胜县| 抚松县| 沁阳市| 永昌县| 南通市| 容城县| 于都县| 新邵县| 九寨沟县| 清徐县| 通榆县| 丹东市| 崇阳县| 冷水江市| 达尔| 清涧县|