官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 145字
  • 2021-06-24 18:52:45

Dummy features

These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.

The following code will show you how to do this kind of transformation:

# constructing binary features
def process_embarked():
global df_titanic_data

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values

# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]

# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
主站蜘蛛池模板: 囊谦县| 开原市| 五台县| 安乡县| 金沙县| 田林县| 镇远县| 从江县| 大埔县| 永川市| 金川县| 海安县| 黑山县| 汽车| 大竹县| 普安县| 招远市| 吉安市| 辽阳县| 乌兰县| 靖江市| 昌图县| 腾冲县| 海林市| 马山县| 昌都县| 滦南县| 太和县| 江口县| 屏东市| 甘孜| 黄梅县| 报价| 白沙| 新疆| 府谷县| 阳春市| 扶绥县| 马关县| 措美县| 石棉县|