官术网_书友最值得收藏!

groupby

Let's now look at an operation that is highly useful, but often difficult for new pandas users to get their heads around: the .groupby() function. We'll walk through a number of examples step by step in order to illustrate the most important functionality.

The groupby operation does exactly what it says: it groups data based on some class or classes you choose. Let's take a look at a simple example using our iris dataset. We'll go back and reimport our original iris dataset, and run our first groupby operation:

Here, data for each species is partitioned and the mean for each feature is provided. Let's take it a step further now and get full descriptive statistics for each species:

Statistics for each species

And now, we can see the full breakdown bucketed by species. Let's now look at some other groupby operations we can perform. We saw previously that petal length and width had some relatively clear boundaries between species. Now, let's examine how we might use groupby to see that:

In this case, we have grouped each unique species by the petal width they were associated with. This is a manageable number of measurements to group by, but if it were to become much larger, we would likely need to partition the measurements into brackets. As we saw previously, that can be accomplished by means of the apply function.

Let's now take a look at a custom aggregation function:

In this code, we grouped petal width by species using the .max() and .min() functions, and a lambda function that returns a maximum petal width less than the minimum petal width.

We've only just touched on the functionality of the groupby function; there is a lot more to learn, so I encourage you to read the documentation available at  http://pandas.pydata.org/pandas-docs/stable/.

Hopefully, you now have a solid base-level understanding of how to manipulate and prepare data in preparation for our next step, which is modeling. We will now move on to discuss the primary libraries in the Python machine learning ecosystem.

主站蜘蛛池模板: 抚远县| 富蕴县| 惠安县| 贵阳市| 荔浦县| 罗定市| 台南县| 长阳| 边坝县| 西峡县| 崇仁县| 双城市| 二连浩特市| 巴彦县| 高唐县| 友谊县| 习水县| 丹棱县| 安平县| 区。| 集贤县| 兴山县| 夏河县| 交城县| 望奎县| 铜鼓县| 岚皋县| 镇沅| 苏尼特右旗| 周宁县| 万年县| 安阳县| 阿拉善右旗| 应城市| 虹口区| 伊宁市| 宝鸡市| 雷波县| 成武县| 庆云县| 平顺县|