官术网_书友最值得收藏!

Name

The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:

# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))

The second interesting property is the Name title, which can also be used to indicate status and/or gender:

# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])

# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'

# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)

You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.

主站蜘蛛池模板: 阿拉善右旗| 乡城县| 东宁县| 尖扎县| 大同县| 宜兰县| 安陆市| 彭州市| 奉化市| 南漳县| 商南县| 新兴县| 广汉市| 贵南县| 张家川| 商丘市| 安远县| 湖州市| 镇远县| 东山县| 商水县| 建始县| 德安县| 汉源县| 临漳县| 张家界市| 丰镇市| 普洱| 富源县| 陆川县| 阳谷县| 岐山县| 大兴区| 抚松县| 德州市| 芷江| 娄底市| 卢氏县| 娄底市| 松桃| 临泽县|