官术网_书友最值得收藏!

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 荥经县| 上饶市| 永平县| 永昌县| 大同市| 安宁市| 宁国市| 寿阳县| 文山县| 唐河县| 黔西| 太仓市| 湛江市| 景泰县| 阿坝县| 盐山县| 蓝田县| 绵阳市| 金昌市| 万盛区| 深圳市| 定结县| 晴隆县| 潮州市| 曲沃县| 衡山县| 德清县| 巴东县| 奉化市| 邓州市| 云安县| 邓州市| 新兴县| 沅江市| 湘潭市| 灵川县| 图木舒克市| 太原市| 逊克县| 大石桥市| 屯留县|