官术网_书友最值得收藏!

apply

The apply function allows us to work with both DataFrames and series. We'll start with an example that would work equally well with map, before moving on to examples that would only work with apply.

Using our iris DataFrame, let's make a new column based on petal width. We previously saw that the mean for the petal width was 1.3. Let's now create a new column in our DataFrame, wide petal, that contains binary values based on the value in the petal width column. If the petal width is equal to or wider than the median, we will code it with a 1, and if it is less than the median, we will code it 0. We'll do this using the apply function on the petal width column:

A few things happened here, so let's walk through them step by step. The first is that we were able to append a new column to the DataFrame simply by using the column selection syntax for a column name, which we want to create, in this case wide petal. We set that new column equal to the output of the apply function. Here, we ran apply on the petal width column that returned the corresponding values in the wide petal column. The apply function works by running through each value of the petal width column. If the value is greater than or equal to 1.3, the function returns 1, otherwise it returns 0. This type of transformation is a fairly common feature engineering transformation in machine learning, so it is good to be familiar with how to perform it.

Let's now take a look at using apply on a DataFrame rather than a single series. We'll now create a feature based on the petal area:

Creating a new feature

Notice that we called apply not on a series here, but on the entire DataFrame, and because apply was called on the entire DataFrame, we passed in axis=1 in order to tell pandas that we want to apply the function row-wise. If we passed in axis=0, then the function would operate column-wise. Here, each column is processed sequentially, and we choose to multiply the values from the petal length (cm) and petal width (cm) columns. The resultant series then becomes the petal area column in our DataFrame. This type of power and flexibility is what makes pandas an indispensable tool for data manipulation.

主站蜘蛛池模板: 南川市| 富源县| 赤城县| 雷波县| 阿克陶县| 方正县| 佳木斯市| 博湖县| 甘洛县| 武平县| 大名县| 额济纳旗| 瑞昌市| 富锦市| 大兴区| 天门市| 澄城县| 无极县| 阳西县| 水城县| 肥西县| 龙门县| 玉田县| 柘城县| 宜川县| 云浮市| 通化县| 永善县| 泸州市| 收藏| 大连市| 阿拉善盟| 丹巴县| 兰考县| 青龙| 桓仁| 瑞金市| 五台县| 徐闻县| 屏山县| 金湖县|