官术网_书友最值得收藏!

Types of features

In the books example, you can see several types of features:

  • Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
  • Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
  • Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

主站蜘蛛池模板: 广汉市| 和平区| 札达县| 临海市| 茂名市| 郓城县| 仙游县| 涟水县| 青浦区| 龙南县| 大洼县| 辉南县| 达尔| 彝良县| 张家界市| 安溪县| 宕昌县| 读书| 福贡县| 海门市| 平江县| 朝阳区| 仙居县| 宁南县| 台南市| 喀喇沁旗| 新化县| 宜州市| 伊宁县| 海伦市| 松阳县| 绥棱县| 奉新县| 金山区| 西乌珠穆沁旗| 花垣县| 朝阳市| 平谷区| 天峻县| 陆河县| 云和县|