官术网_书友最值得收藏!

apply

The apply function allows us to work with both DataFrames and series. We'll start with an example that would work equally well with map, before moving on to examples that would only work with apply.

Using our iris DataFrame, let's make a new column based on petal width. We previously saw that the mean for the petal width was 1.3. Let's now create a new column in our DataFrame, wide petal, that contains binary values based on the value in the petal width column. If the petal width is equal to or wider than the median, we will code it with a 1, and if it is less than the median, we will code it 0. We'll do this using the apply function on the petal width column:

A few things happened here, so let's walk through them step by step. The first is that we were able to append a new column to the DataFrame simply by using the column selection syntax for a column name, which we want to create, in this case wide petal. We set that new column equal to the output of the apply function. Here, we ran apply on the petal width column that returned the corresponding values in the wide petal column. The apply function works by running through each value of the petal width column. If the value is greater than or equal to 1.3, the function returns 1, otherwise it returns 0. This type of transformation is a fairly common feature engineering transformation in machine learning, so it is good to be familiar with how to perform it.

Let's now take a look at using apply on a DataFrame rather than a single series. We'll now create a feature based on the petal area:

Creating a new feature

Notice that we called apply not on a series here, but on the entire DataFrame, and because apply was called on the entire DataFrame, we passed in axis=1 in order to tell pandas that we want to apply the function row-wise. If we passed in axis=0, then the function would operate column-wise. Here, each column is processed sequentially, and we choose to multiply the values from the petal length (cm) and petal width (cm) columns. The resultant series then becomes the petal area column in our DataFrame. This type of power and flexibility is what makes pandas an indispensable tool for data manipulation.

主站蜘蛛池模板: 揭东县| 随州市| 美姑县| 新野县| 科技| 岑巩县| 通州区| 凤阳县| 同德县| 南城县| 永州市| 台中县| 富蕴县| 安阳市| 安多县| 时尚| 锡林郭勒盟| 大姚县| 永仁县| 香格里拉县| 睢宁县| 阳曲县| 微博| 阿拉善盟| 淮安市| 新乐市| 石林| 高州市| 鄄城县| 襄城县| 苏尼特左旗| 汽车| 那曲县| 西峡县| 涡阳县| 通化县| 昌江| 永城市| 大厂| 周宁县| 祁连县|