官术网_书友最值得收藏!

groupby

Let's now look at an operation that is highly useful, but often difficult for new pandas users to get their heads around: the .groupby() function. We'll walk through a number of examples step by step in order to illustrate the most important functionality.

The groupby operation does exactly what it says: it groups data based on some class or classes you choose. Let's take a look at a simple example using our iris dataset. We'll go back and reimport our original iris dataset, and run our first groupby operation:

Here, data for each species is partitioned and the mean for each feature is provided. Let's take it a step further now and get full descriptive statistics for each species:

Statistics for each species

And now, we can see the full breakdown bucketed by species. Let's now look at some other groupby operations we can perform. We saw previously that petal length and width had some relatively clear boundaries between species. Now, let's examine how we might use groupby to see that:

In this case, we have grouped each unique species by the petal width they were associated with. This is a manageable number of measurements to group by, but if it were to become much larger, we would likely need to partition the measurements into brackets. As we saw previously, that can be accomplished by means of the apply function.

Let's now take a look at a custom aggregation function:

In this code, we grouped petal width by species using the .max() and .min() functions, and a lambda function that returns a maximum petal width less than the minimum petal width.

We've only just touched on the functionality of the groupby function; there is a lot more to learn, so I encourage you to read the documentation available at  http://pandas.pydata.org/pandas-docs/stable/.

Hopefully, you now have a solid base-level understanding of how to manipulate and prepare data in preparation for our next step, which is modeling. We will now move on to discuss the primary libraries in the Python machine learning ecosystem.

主站蜘蛛池模板: 朝阳县| 辰溪县| 六安市| 正定县| 友谊县| 林口县| 交城县| 衢州市| 阜宁县| 灵石县| 板桥市| 贵州省| 南丰县| 永泰县| 阿拉善盟| 通榆县| 商洛市| 湖口县| 吴堡县| 龙陵县| 瓦房店市| 黄石市| 元氏县| 清流县| 墨竹工卡县| 喜德县| 浦江县| 金乡县| 乐平市| 大荔县| 安西县| 中卫市| 静安区| 汉寿县| 牟定县| 姚安县| 洱源县| 康平县| 鹤壁市| 衡东县| 云浮市|