官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 145字
  • 2021-06-24 18:52:45

Dummy features

These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.

The following code will show you how to do this kind of transformation:

# constructing binary features
def process_embarked():
global df_titanic_data

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values

# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]

# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
主站蜘蛛池模板: 厦门市| 永善县| 阜康市| 太仆寺旗| 双牌县| 大英县| 毕节市| 精河县| 博白县| 波密县| 达孜县| 麻栗坡县| 宁化县| 商水县| 织金县| 奉化市| 吉安市| 胶南市| 南溪县| 拜城县| 天气| 铁岭市| 遂宁市| 丰原市| 开封市| 崇阳县| 巨鹿县| 崇左市| 临潭县| 拉萨市| 太仆寺旗| 惠来县| 上蔡县| 罗源县| 沂南县| 繁昌县| 石屏县| 八宿县| 车险| 维西| 合山市|