- Deep Learning By Example
- Ahmed Menshawy
- 183字
- 2021-06-24 18:52:46
Name
The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:
# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))
The second interesting property is the Name title, which can also be used to indicate status and/or gender:
# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])
# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'
# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)
You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.
推薦閱讀
- R Machine Learning By Example
- Visual FoxPro 6.0數(shù)據(jù)庫與程序設(shè)計(jì)
- 程序設(shè)計(jì)缺陷分析與實(shí)踐
- 電腦上網(wǎng)直通車
- 大數(shù)據(jù)安全與隱私保護(hù)
- Docker on Amazon Web Services
- PVCBOT機(jī)器人控制技術(shù)入門
- 悟透AutoCAD 2009案例自學(xué)手冊(cè)
- Mastering GitLab 12
- 生物3D打印:從醫(yī)療輔具制造到細(xì)胞打印
- 未來學(xué)徒:讀懂人工智能飛馳時(shí)代
- Deep Learning Essentials
- EJB JPA數(shù)據(jù)庫持久層開發(fā)實(shí)踐詳解
- JSP網(wǎng)絡(luò)開發(fā)入門與實(shí)踐
- 智能控制技術(shù)及其應(yīng)用