- Deep Learning By Example
- Ahmed Menshawy
- 145字
- 2021-06-24 18:52:45
Dummy features
These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.
The following code will show you how to do this kind of transformation:
# constructing binary features
def process_embarked():
global df_titanic_data
# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]
# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
推薦閱讀
- Java編程全能詞典
- 大數據技術與應用基礎
- Mastering Proxmox(Third Edition)
- ROS機器人編程與SLAM算法解析指南
- 數據產品經理:解決方案與案例分析
- Matplotlib 3.0 Cookbook
- Python Algorithmic Trading Cookbook
- Blender Compositing and Post Processing
- 我也能做CTO之程序員職業規劃
- Microsoft System Center Confi guration Manager
- 基于企業網站的顧客感知服務質量評價理論模型與實證研究
- ESP8266 Robotics Projects
- Bayesian Analysis with Python
- Building Google Cloud Platform Solutions
- 無人駕駛感知智能