官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Selecting the data.
  2. Cleaning the data.
  3. Constructing the data.
  4. Integrating the data.
  5. Formatting the data.

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unmitigated burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the payoff, where you earn your money.

主站蜘蛛池模板: 化州市| 波密县| 故城县| 调兵山市| 新龙县| 景德镇市| 金平| 宁波市| 饶平县| 怀宁县| 杭锦后旗| 望谟县| 哈尔滨市| 林州市| 吉隆县| 扎鲁特旗| 合江县| 广丰县| 察哈| 苏尼特左旗| 永泰县| 玛多县| 沅陵县| 荔波县| 信阳市| 青田县| 蒲江县| 文登市| 海淀区| 山阳县| 汤原县| 五莲县| 南投市| 长葛市| 莎车县| 鄂伦春自治旗| 长阳| 洞头县| 沾益县| 连江县| 滨州市|