官术网_书友最值得收藏!

One-hot-encoding

The one-of-K or one-hot-encoding scheme uses dummy variables to encode categorical features. Originally it was applied to digital circuits. The dummy variables have binary values like bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we will have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables, as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the SciPy package, and shouldn't be an issue. We will discuss the SciPy package later in this chapter.

主站蜘蛛池模板: 五台县| 玛多县| 永和县| 铁岭县| 上栗县| 西和县| 揭阳市| 平顶山市| 鄂伦春自治旗| 余干县| 南开区| 沈丘县| 虎林市| 合江县| 乌拉特后旗| 肥东县| 保山市| 沂源县| 西安市| 宁国市| 普兰店市| 铜山县| 龙门县| 甘德县| 都安| 卢湾区| 山东省| 平和县| 小金县| 普安县| 武乡县| 常州市| 平凉市| 永善县| 内江市| 镇巴县| 延安市| 河池市| 安福县| 芜湖县| 盐源县|