官术网_书友最值得收藏!

Entropy of the target variable

The definition of entropy when looking at a single attribute is as follows:

Here, c is the total number of possible values of the feature f, pi is the probability of each value, and log2(pi) is the base two logarithm of the same probability. The calculation details are as follows:

  1. We need to count the number of Yes and No decisions in the dataset. In our simple example, they can be counted by hand, but if the dataset is larger, we can use Excel functions:

COUNTIF(F2:F15;"Yes") and COUNTIF(F2:F15;"No")

We then get the calculation that Yes = 9 and No = 5.

  1. When applying the entropy formula to the target variable, we get the following:

Here, the probabilities are calculated as the number of Yes (9) or No (5) over the total number (14).

This calculation can also be easily performed in the Excel sheet using I3/(I3+J3)*LOG(I3/(I3+J3);2)-J3/(I3+J3)*LOG(J3/(I3+J3);2) with I3=9 and J3=5.
主站蜘蛛池模板: 东源县| 磴口县| 凤山县| 芜湖市| 汕尾市| 内乡县| 和龙市| 沽源县| 西贡区| 博野县| 永福县| 嘉荫县| 调兵山市| 额济纳旗| 香河县| 舟山市| 黄梅县| 阿勒泰市| 保康县| 揭东县| 德江县| 托克托县| 澄江县| 依安县| 扶余县| 鹿邑县| 康马县| 银川市| 民乐县| 华池县| 民勤县| 邵阳县| 原平市| 青浦区| 封开县| 南康市| 仪征市| 兴隆县| 山西省| 原阳县| 淳安县|