官术网_书友最值得收藏!

One hot encoding

The one-of-K or one hot encoding scheme uses dummy variables to encode categorical features. Originally, it was applied to digital circuits. The dummy variables have binary values such as bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we'll have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the the scipy package and shouldn't be an issue. We'll discuss the scipy package later in this chapter.

主站蜘蛛池模板: 南平市| 金华市| 威信县| 绥宁县| 哈巴河县| 宿迁市| 峨边| 阿尔山市| 泰州市| 南通市| 将乐县| 清镇市| 沁水县| 五家渠市| 崇信县| 独山县| 吉林省| 江川县| 广元市| 昌邑市| 仁寿县| 临洮县| 黄龙县| 米泉市| 桐庐县| 建宁县| 清水河县| 衡水市| 安乡县| 镇雄县| 开原市| 玛沁县| 焉耆| 长武县| 云阳县| 凤城市| 洛浦县| 朝阳市| 阳高县| 上犹县| 章丘市|