官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 183字
  • 2021-06-24 18:52:46

Name

The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:

# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))

The second interesting property is the Name title, which can also be used to indicate status and/or gender:

# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])

# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'

# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)

You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.

主站蜘蛛池模板: 昌乐县| 嘉善县| 开江县| 龙口市| 湘潭县| 鄂托克旗| 二连浩特市| 灵寿县| 兴化市| 扶余县| 积石山| 家居| 准格尔旗| 安西县| 清流县| 札达县| 盐山县| 张家界市| 贵州省| 锡林浩特市| 称多县| 杭州市| 宁远县| 玉龙| 利川市| 罗江县| 山阳县| 福泉市| 雅安市| 东安县| 海伦市| 锦州市| 连平县| 扬中市| 蓝田县| 瓮安县| 东海县| 裕民县| 溆浦县| 奇台县| 漳浦县|