官术网_书友最值得收藏!

Feature transformations

In the previous two sections, we covered reading the train and test sets and combining them. We also handled some missing values. Now, we will use the random forest classifier of scikit-learn to predict the survival of passengers. Different implementations of the random forest algorithm accept different types of data. The scikit-learn implementation of random forest accepts only numeric data. So, we need to transform the categorical features into numerical ones.

There are two types of features:

  • Quantitative: Quantitative features are measured in a numerical scale and can be meaningfully sorted. In the Titanic data samples, the Age feature is an example of a quantitative feature.

  • Qualitative: Qualitative variables, also called categorical variables, are variables that are not numerical. They describe data that fits into categories. In the Titanic data samples, the Embarked (indicates the name of the departure port) feature is an example of a qualitative feature.

We can apply different kinds of transformations to different variables. The following are some approaches that one can use to transform qualitative/categorical features.

主站蜘蛛池模板: 宜川县| 西吉县| 张家口市| 读书| 德钦县| 锦屏县| 章丘市| 陵川县| 甘德县| 珲春市| 保定市| 罗定市| 凤庆县| 班玛县| 娄底市| 哈尔滨市| 务川| 石家庄市| 大足县| 太保市| 张家港市| 从江县| 固始县| 宣城市| 芜湖市| 南漳县| 潼南县| 大关县| 梓潼县| 尖扎县| 喀喇沁旗| 阿拉尔市| 乳山市| 铜梁县| 股票| 新郑市| 娱乐| 民丰县| 剑川县| 抚州市| 镇远县|