官术网_书友最值得收藏!

One hot encoding

Numerical or categorical information can easily be normally represented by integers, one for each option or discrete result. But there are situations where bins indicating the current option are preferred. This form of data representation is called one hot encodingThis encoding simply transforms a certain input into a binary array containing only zeros, except for the value indicated by the value of a variable, which will be one.

In the simple case of an integer, this will be the representation of the list [1, 3, 2, 4] in one hot encoding:

[[0 1 0 0 0]
[0 0 0 1 0]
[0 0 1 0 0]
[0 0 0 0 1]]

Let's perform a simple implementation of a one hot integer encoder for integer arrays, in order to better understand the concept:

import numpy as np
def get_one_hot(input_vector):
result=[]
for i in input_vector:
newval=np.zeros(max(input_vector))
newval.itemset(i-1,1)
result.append(newval)
return result

In this example, we first define the get_one_hot function, which takes an array as input and returns an array.

What we do is take the elements of the arrays one by one, and for each element in it, we generate a zero array with length equal to the maximum value of the array, in order to have space for all possible values. Then we insert 1 on the index position indicated by the current value (we subtract 1 because we go from 1-based indexes to 0-based indexes).

Let's try the function we just wrote:

get_one_hot([1,5,2,4,3])

#Out:
[array([ 1., 0., 0., 0., 0.]),
array([ 0., 0., 0., 0., 1.]),
array([ 0., 1., 0., 0., 0.]),
array([ 0., 0., 0., 1., 0.]),
array([ 0., 0., 1., 0., 0.])]
主站蜘蛛池模板: 松潘县| 峨眉山市| 石城县| 阜康市| 那坡县| 平顶山市| 民勤县| 泸州市| 遂溪县| 万载县| 交口县| 甘洛县| 伊川县| 和政县| 沂源县| 崇文区| 洛宁县| 宜良县| 长治市| 康平县| 石门县| 格尔木市| 通化市| 沧源| 新干县| 新乡市| 义马市| 冕宁县| 日喀则市| 横峰县| 定襄县| 杭锦旗| 南江县| 兴仁县| 仁怀市| 富民县| 靖边县| 长武县| 克山县| 正蓝旗| 托克逊县|