官术网_书友最值得收藏!

Assigning an average value

This is also one of the common approaches because of its simplicity. In the case of a numerical feature, you can just replace the missing values with the mean or median. You can also use this approach in the case of categorical variables by assigning the mode (the value that has the highest occurrence) to the missing values.

The following code assigns the median of the non-missing values of the Fare feature to the missing values:

# handling the missing values by replacing it with the median fare
df_titanic_data['Fare'][np.isnan(df_titanic_data['Fare'])] = df_titanic_data['Fare'].median()

Or, you can use the following code to find the value that has the highest occurrence in the Embarked feature and assign it to the missing values:

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df_titanic_data.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
主站蜘蛛池模板: 通化市| 亚东县| 保亭| 滦平县| 沁水县| 满城县| 广汉市| 新沂市| 开平市| 汶上县| 开化县| 磴口县| 盘山县| 水城县| 双江| 抚松县| 赤峰市| 洪泽县| 五寨县| 大田县| 新化县| 香格里拉县| 武川县| 乐都县| 固安县| 门头沟区| 社旗县| 泾源县| 贵州省| 邵东县| 蛟河市| 德钦县| 南乐县| 朔州市| 丽江市| 资中县| 淮阳县| 离岛区| 姚安县| 呼和浩特市| 呼图壁县|