官术网_书友最值得收藏!

Feature transformations

In the previous two sections, we covered reading the train and test sets and combining them. We also handled some missing values. Now, we will use the random forest classifier of scikit-learn to predict the survival of passengers. Different implementations of the random forest algorithm accept different types of data. The scikit-learn implementation of random forest accepts only numeric data. So, we need to transform the categorical features into numerical ones.

There are two types of features:

  • Quantitative: Quantitative features are measured in a numerical scale and can be meaningfully sorted. In the Titanic data samples, the Age feature is an example of a quantitative feature.

  • Qualitative: Qualitative variables, also called categorical variables, are variables that are not numerical. They describe data that fits into categories. In the Titanic data samples, the Embarked (indicates the name of the departure port) feature is an example of a qualitative feature.

We can apply different kinds of transformations to different variables. The following are some approaches that one can use to transform qualitative/categorical features.

主站蜘蛛池模板: 广平县| 高邮市| 玉山县| 运城市| 祁连县| 赤峰市| 榆中县| 兴和县| 三台县| 九寨沟县| 陈巴尔虎旗| 吉木乃县| 邳州市| 张家界市| 合山市| 闻喜县| 大港区| 宁德市| 嫩江县| 会昌县| 宜兰市| 南阳市| 乌拉特后旗| 商河县| 家居| 定西市| 福贡县| 彩票| 岫岩| 陵川县| 页游| 云阳县| 宜君县| 桑植县| 诸城市| 来宾市| 扶沟县| 巫溪县| 静宁县| 罗定市| 永川市|