官术网_书友最值得收藏!

Imputation of missing data

When dealing with not-so-perfect or incomplete datasets, a missing register may not add value to the model in itself, but all the other elements of the row could be useful to the model. This is especially true when the model has a high percentage of incomplete values, so no row can be discarded.

The main question in this process is "how do you interpret a missing value?" There are many ways, and they usually depend on the problem itself.

A very naive approach could be set the value to zero, supposing that the mean of the data distribution is 0. An improved step could be to relate the missing data with the surrounding content, assigning the average of the whole column, or an interval of n elements of the same columns. Another option is to use the column's median or most frequent value.

Additionally, there are more advanced techniques, such as robust methods and even k-nearest neighbors, that we won't cover in this book.

主站蜘蛛池模板: 仪征市| 海安县| 油尖旺区| 萨迦县| 盈江县| 高要市| 临安市| 杂多县| 梓潼县| 文登市| 石林| 诸城市| 吉木萨尔县| 于田县| 福清市| 米脂县| 若羌县| 穆棱市| 石屏县| 偏关县| 城市| 揭西县| 克拉玛依市| 黄山市| 安乡县| 米脂县| 西宁市| 都安| 白银市| 镇赉县| 屏山县| 堆龙德庆县| 敦化市| 泰安市| 万源市| 西吉县| 馆陶县| 文成县| 顺昌县| 社旗县| 舒兰市|