官术网_书友最值得收藏!

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 龙门县| 九江县| 云安县| 鄂托克前旗| 叶城县| 石嘴山市| 四川省| 富民县| 茶陵县| 加查县| 宝鸡市| 宣汉县| 九龙县| 永寿县| 民勤县| 河源市| 尖扎县| 宁陵县| 岳阳县| 新巴尔虎左旗| 深圳市| 务川| 虹口区| 景宁| 台江县| 斗六市| 光山县| 江川县| 姜堰市| 台南市| 逊克县| 崇左市| 益阳市| 胶南市| 阿勒泰市| 栖霞市| 星座| 崇明县| 岢岚县| 水富县| 茂名市|