- Deep Learning By Example
- Ahmed Menshawy
- 183字
- 2021-06-24 18:52:46
Name
The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:
# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))
The second interesting property is the Name title, which can also be used to indicate status and/or gender:
# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])
# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'
# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)
You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.
推薦閱讀
- 我的J2EE成功之路
- 數(shù)據(jù)運(yùn)營(yíng)之路:掘金數(shù)據(jù)化時(shí)代
- 返璞歸真:UNIX技術(shù)內(nèi)幕
- 數(shù)據(jù)挖掘?qū)嵱冒咐治?/a>
- 機(jī)器人編程實(shí)戰(zhàn)
- 21天學(xué)通Visual Basic
- 永磁同步電動(dòng)機(jī)變頻調(diào)速系統(tǒng)及其控制(第2版)
- 3D Printing for Architects with MakerBot
- Hadoop應(yīng)用開(kāi)發(fā)基礎(chǔ)
- Photoshop CS5圖像處理入門(mén)、進(jìn)階與提高
- Artificial Intelligence By Example
- Learning Linux Shell Scripting
- 工業(yè)機(jī)器人實(shí)操進(jìn)階手冊(cè)
- 空間機(jī)器人
- 運(yùn)動(dòng)控制系統(tǒng)(第2版)