官术网_书友最值得收藏!

Setting parameters in Random Forests

The Random Forest implementation in scikit-learn is called RandomForestClassifier, and it has a number of parameters. As Random Forests use many instances of DecisionTreeClassifier, they share many of the same parameters such as the criterion (Gini Impurity or Entropy/information gain), max_features, and min_samples_split.

There are some new parameters that are used in the ensemble process:

  • n_estimators: This dictates how many decision trees should be built. A higher value will take longer to run, but will (probably) result in a higher accuracy.
  • oob_score: If true, the method is tested using samples that aren't in the random subsamples chosen for training the decision trees.
  • n_jobs: This specifies the number of cores to use when training the decision trees in parallel.

The scikit-learn package uses a library called Joblib for inbuilt parallelization. This parameter dictates how many cores to use. By default, only a single core is used--if you have more cores, you can increase this, or set it to -1 to use all cores.

主站蜘蛛池模板: 渝中区| 綦江县| 江山市| 宣汉县| 金山区| 无极县| 西和县| 葫芦岛市| 枣庄市| 东平县| 明光市| 延长县| 聊城市| 安溪县| 麟游县| 讷河市| 锡林浩特市| 远安县| 新绛县| 凌云县| 嘉荫县| 融水| 东乌珠穆沁旗| 青海省| 赣榆县| 崇义县| 龙游县| 乌拉特前旗| 丰原市| 汾西县| 福鼎市| 辉南县| 安国市| 独山县| 永济市| 新乐市| 屏边| 涿鹿县| 阳原县| 静宁县| 日土县|