- Deep Learning By Example
- Ahmed Menshawy
- 145字
- 2021-06-24 18:52:45
Dummy features
These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.
The following code will show you how to do this kind of transformation:
# constructing binary features
def process_embarked():
global df_titanic_data
# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]
# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
推薦閱讀
- 人工智能超越人類
- R Data Mining
- 蕩胸生層云:C語言開發修行實錄
- Learning Social Media Analytics with R
- Photoshop CS3圖像處理融會貫通
- 網絡化分布式系統預測控制
- 計算機網絡原理與技術
- Linux系統下C程序開發詳解
- AMK伺服控制系統原理及應用
- 機床電氣控制與PLC
- Learning Cassandra for Administrators
- Apache Spark Quick Start Guide
- Embedded Linux Development using Yocto Projects(Second Edition)
- 單片機C51應用技術
- 網絡安全原理與應用