官术网_书友最值得收藏!

Cleaning and preparing data

Feature selection is not the only consideration required when preprocessing your data. There are many other things that you may need to do to prepare your data for the algorithm that will ultimately analyze the data. Perhaps there are measurement errors that create significant outliers. There can also be instrumentation noise in the data that needs to be smoothed out. Your data may have missing values for some features. These are all issues that can either be ignored or addressed, depending, as always, on the context, the data, and the algorithm involved.

Additionally, the algorithm you use may require the data to be normalized to some range of values. Or perhaps your data is in a different format that the algorithm cannot use, as is often the case with neural networks which expect you to provide a vector of values, but you have JSON objects that come from a database. Sometimes you need to analyze only a specific subset of data from a larger source. If you're working with images you may need to resize, scale, pad, crop, or reduce the image to grayscale.

These tasks all fall into the realm of data preprocessing. Let's take a look at some specific scenarios and discuss possible approaches for each.

主站蜘蛛池模板: 定西市| 北川| 清苑县| 搜索| 巩义市| 商城县| 大港区| 通州区| 阳东县| 富锦市| 鄂伦春自治旗| 万安县| 西城区| 奎屯市| 灵台县| 甘孜县| 两当县| 白沙| 八宿县| 南郑县| 黔南| 锦屏县| 白玉县| 南雄市| 襄樊市| 呼伦贝尔市| 丹江口市| 安岳县| 专栏| 河南省| 景泰县| 虎林市| 宣化县| 图木舒克市| 罗源县| 汝州市| 金阳县| 禄丰县| 扎赉特旗| 安乡县| 青岛市|