官术网_书友最值得收藏!

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 铜陵市| 察隅县| 册亨县| 布尔津县| 新疆| 蒲城县| 九江县| 沂源县| 长岭县| 广宗县| 天柱县| 林甸县| 墨竹工卡县| 石首市| 灵武市| 宣城市| 青川县| 民和| 克拉玛依市| 陆良县| 涿鹿县| 大悟县| 玉屏| 垫江县| 左云县| 河南省| 三原县| 翁牛特旗| 贵州省| 印江| 横峰县| 方城县| 兰考县| 淄博市| 西藏| 托克逊县| 赤水市| 中山市| 天台县| 全南县| 卢湾区|