官术网_书友最值得收藏!

Feature transformations

In the previous two sections, we covered reading the train and test sets and combining them. We also handled some missing values. Now, we will use the random forest classifier of scikit-learn to predict the survival of passengers. Different implementations of the random forest algorithm accept different types of data. The scikit-learn implementation of random forest accepts only numeric data. So, we need to transform the categorical features into numerical ones.

There are two types of features:

  • Quantitative: Quantitative features are measured in a numerical scale and can be meaningfully sorted. In the Titanic data samples, the Age feature is an example of a quantitative feature.

  • Qualitative: Qualitative variables, also called categorical variables, are variables that are not numerical. They describe data that fits into categories. In the Titanic data samples, the Embarked (indicates the name of the departure port) feature is an example of a qualitative feature.

We can apply different kinds of transformations to different variables. The following are some approaches that one can use to transform qualitative/categorical features.

主站蜘蛛池模板: 林甸县| 垣曲县| 萨嘎县| 辉南县| 漳平市| 黑河市| 大英县| 轮台县| 南华县| 福泉市| 天柱县| 达州市| 汕头市| 泸水县| 库车县| 绥棱县| 昆山市| 武冈市| 阿鲁科尔沁旗| 饶河县| 佛冈县| 胶南市| 永顺县| 石阡县| 溆浦县| 辽宁省| 天等县| 安龙县| 霍山县| 沁水县| 云霄县| 新乡县| 晋中市| 台前县| 巢湖市| 循化| 铜梁县| 凉山| 和静县| 伊宁县| 伊通|