官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 220字
  • 2021-06-24 18:52:45

Using a regression or another simple model to predict the values of missing variables

This is the approach that we will use for the Age feature of the Titanic example. The Age feature is an important step towards predicting the survival of passengers, and applying the previous approach by taking the mean will make us lose some information.

In order to predict the missing values, you need to use a supervised learning algorithm that takes the available features as input and the available values of the feature that you want to predict for its missing value as output. In the following code snippet, we are using the random forest classifier to predict the missing values of the Age feature:

# Define a helper function that can use RandomForestClassifier for handling the missing values of the age variable
def set_missing_ages():
global df_titanic_data

age_data = df_titanic_data[
['Age', 'Embarked', 'Fare', 'Parch', 'SibSp', 'Title_id', 'Pclass', 'Names', 'CabinLetter']]
input_values_RF = age_data.loc[(df_titanic_data.Age.notnull())].values[:, 1::]
target_values_RF = age_data.loc[(df_titanic_data.Age.notnull())].values[:, 0]

# Creating an object from the random forest regression function of sklearn<use the documentation for more details>
regressor = RandomForestRegressor(n_estimators=2000, n_jobs=-1)

# building the model based on the input values and target values above
regressor.fit(input_values_RF, target_values_RF)

# using the trained model to predict the missing values
predicted_ages = regressor.predict(age_data.loc[(df_titanic_data.Age.isnull())].values[:, 1::])

    # Filling the predicted ages in the original titanic dataframe
age_data.loc[(age_data.Age.isnull()), 'Age'] = predicted_ages
主站蜘蛛池模板: 阳朔县| 伊吾县| 唐河县| 阳春市| 扶绥县| 西贡区| 克山县| 疏附县| 东阿县| 郁南县| 枝江市| 夏河县| 改则县| 通海县| 吴桥县| 林周县| 芒康县| 微博| 鲁山县| 鄂州市| 湖南省| 舟山市| 海林市| 永清县| 柞水县| 旌德县| 宝山区| 湖口县| 上高县| 五寨县| 霍城县| 潼关县| 遂平县| 鄂伦春自治旗| 棋牌| 瑞金市| 巴林右旗| 红桥区| 犍为县| 唐海县| 昌吉市|