官术网_书友最值得收藏!

Dataset preprocessing

When we first dive into data science, a common mistake is expecting all the data to be very polished and with good characteristics from the very beginning. Alas, that is not the case for a very considerable percentage of cases, for many reasons such as null data, sensor errors that cause outliers and NAN, faulty registers, instrument-induced bias, and all kinds of defects that lead to poor model fitting and that must be eradicated.

The two key processes in this stage are data normalization and feature scaling. This process consists of applying simple transformations called affine that map the current unbalanced data into a more manageable shape, maintaining its integrity but providing better stochastic properties and improving the future applied model. The common goal of the standardization techniques is to bring the data distribution closer to a normal distribution, with the following techniques:

主站蜘蛛池模板: 中西区| 海伦市| 确山县| 惠安县| 石楼县| 五河县| 顺平县| 株洲县| 腾冲县| 河东区| 松滋市| 徐闻县| 师宗县| 皋兰县| 康保县| 南川市| 禹城市| 视频| 和田县| 三明市| 和龙市| 白银市| 安徽省| 汪清县| 仪陇县| 黑河市| 瑞安市| 绥德县| 梅河口市| 华亭县| 赣州市| 鄂尔多斯市| 安泽县| 洛阳市| 广水市| 高安市| 高阳县| 泸西县| 柘荣县| 左云县| 龙胜|