官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Select the data
  2. Clean the data
  3. Construct the data
  4. Integrate the data
  5. Format the data

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unforgivable burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the money step.

主站蜘蛛池模板: 额尔古纳市| 乾安县| 石狮市| 土默特右旗| 澄城县| 东海县| 鄂州市| 泸水县| 正定县| 滦南县| 汤原县| 三门峡市| 蒲城县| 青海省| 乌恰县| 兴海县| 江山市| 武汉市| 宁化县| 赞皇县| 姚安县| 海南省| 龙江县| 石景山区| 南城县| 利辛县| 准格尔旗| 大化| 个旧市| 慈利县| 漯河市| 鄱阳县| 福贡县| 凌海市| 白玉县| 赣州市| 鲜城| 东源县| 库尔勒市| 会昌县| 琼海市|