官术网_书友最值得收藏!

Types of features

In the books example, you can see several types of features:

  • Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
  • Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
  • Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

主站蜘蛛池模板: 梅河口市| 灵山县| 界首市| 伊吾县| 唐海县| 洮南市| 瑞丽市| 镇平县| 恩平市| 南城县| 淅川县| 大竹县| 鹤壁市| 井研县| 仙游县| 理塘县| 萍乡市| 克什克腾旗| 新兴县| 石楼县| 长治市| 鹿泉市| 河北区| 项城市| 甘孜县| 大安市| 石首市| 尉氏县| 诸暨市| 麻阳| 北票市| 天等县| 耒阳市| 潞西市| 苏州市| 平阳县| 景宁| 太康县| 塔河县| 东明县| 徐闻县|