官术网_书友最值得收藏!

Frequency table

Let's build a frequency table, which is the usual way of counting the total number of combinations between variables. In our case, we use it to decide which variable choice leads to a larger reduction of the entropy:

  1. Count the different combinations of feature values, taking each feature compared to the Train outside target variable. You can count them manually in this particular example, but it is useful to have a general method to do this in case we are working with a larger dataset.
  2. To count the number of feature combinations, we start by concatenating the values in the data table in pairs. For example, CONCATENATE(B2;"_";F2) gives us Hot_No.
  3. If we copy the formula down to complete the total number of rows, we get all possible combinations of the Temperature and Train outside variables.
  4. If we repeat the same calculation with the rest of the features, the results will be as follows:
  1. Create pivot tables to count the number of unique values in each column, that is, the number of unique combinations. This can be done by selecting the full range in the column, right-clicking anywhere in the selection, and left-clicking on Quick Analysis. The following dialogue will pop up:
  1. Select Tables | PivotTable to create a table like the following:
  1. Repeat the same procedure with all columns and build all frequency tables and the two-variable entropy. The resulting tables and the entropy calculations are shown in the following subsection.
主站蜘蛛池模板: 札达县| 娄烦县| 玛多县| 重庆市| 汉阴县| 宝兴县| 祁阳县| 乌苏市| 佛冈县| 金湖县| 金堂县| 潼南县| 新蔡县| 正宁县| 都昌县| 壤塘县| 定兴县| 大冶市| 五指山市| 盐津县| 元朗区| 江都市| 钟祥市| 陕西省| 白玉县| 泸溪县| 定远县| 青冈县| 东丽区| 南京市| 新巴尔虎右旗| 通辽市| 汉川市| 仙桃市| 海兴县| 隆林| 桓台县| 桦甸市| 吴桥县| 荔波县| 上蔡县|