- Deep Learning By Example
- Ahmed Menshawy
- 220字
- 2021-06-24 18:52:45
Using a regression or another simple model to predict the values of missing variables
This is the approach that we will use for the Age feature of the Titanic example. The Age feature is an important step towards predicting the survival of passengers, and applying the previous approach by taking the mean will make us lose some information.
In order to predict the missing values, you need to use a supervised learning algorithm that takes the available features as input and the available values of the feature that you want to predict for its missing value as output. In the following code snippet, we are using the random forest classifier to predict the missing values of the Age feature:
# Define a helper function that can use RandomForestClassifier for handling the missing values of the age variable
def set_missing_ages():
global df_titanic_data
age_data = df_titanic_data[
['Age', 'Embarked', 'Fare', 'Parch', 'SibSp', 'Title_id', 'Pclass', 'Names', 'CabinLetter']]
input_values_RF = age_data.loc[(df_titanic_data.Age.notnull())].values[:, 1::]
target_values_RF = age_data.loc[(df_titanic_data.Age.notnull())].values[:, 0]
# Creating an object from the random forest regression function of sklearn<use the documentation for more details>
regressor = RandomForestRegressor(n_estimators=2000, n_jobs=-1)
# building the model based on the input values and target values above
regressor.fit(input_values_RF, target_values_RF)
# using the trained model to predict the missing values
predicted_ages = regressor.predict(age_data.loc[(df_titanic_data.Age.isnull())].values[:, 1::])
# Filling the predicted ages in the original titanic dataframe
age_data.loc[(age_data.Age.isnull()), 'Age'] = predicted_ages
推薦閱讀
- Ansible Configuration Management
- 玩轉(zhuǎn)智能機(jī)器人程小奔
- 高性能混合信號ARM:ADuC7xxx原理與應(yīng)用開發(fā)
- 3D Printing with RepRap Cookbook
- Photoshop CS4經(jīng)典380例
- Effective DevOps with AWS
- 計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)
- Hybrid Cloud for Architects
- Blender 3D Printing by Example
- 網(wǎng)絡(luò)服務(wù)搭建、配置與管理大全(Linux版)
- 從零開始學(xué)Java Web開發(fā)
- 基于RPA技術(shù)財(cái)務(wù)機(jī)器人的應(yīng)用與研究
- 案例解說Delphi典型控制應(yīng)用
- 軟件測試管理
- 大數(shù)據(jù):從基礎(chǔ)理論到最佳實(shí)踐