官术网_书友最值得收藏!

K-fold cross-validation

You've already seen a form of cross-validation before; holding out a portion of our data is the simplest form of cross- validation that we can have. While this is generally a good practice, it can sometimes leave important features out of the training set that can create poor performance when it comes time to test. To remedy this, we can take standard cross validation a step further with a technique called k-fold cross validation

In k-fold cross validation, our dataset is evenly divided in k event parts, chosen by the user. As a rule of thumb, generally you should stick to k = 5 or k = 10 for best performance. The model is then trained and tested k times over. During each training episode, one k segment of the data held out as a testing set and the other segments used as training. You can think of this like shuffling a deck of cards - each time we are taking one card out for testing, and leaving the rest for training. The total accuracy of the model and it's error is then the combination of all of the train/test episode that were conducted. 

There are some models, such as Logistic Regression and Support vector machines, which benefit from k-fold cross validation. Neural network models, such as the ones that we will be discussing in the coming chapter, also benefit from k-fold cross validation methods. Random Forest models like we described precedingly, on the other hand, do not require k-fold cross-validation. K-fold is used as a tuning and optimization method for balancing feature importances, and Random Forests already contain a measure of feature importance. 

主站蜘蛛池模板: 临江市| 琼中| 彭泽县| 徐闻县| 贵南县| 凤山县| 浑源县| 罗城| 临沂市| 贺兰县| 突泉县| 瓦房店市| 唐山市| 榆树市| 九江县| 新巴尔虎左旗| 苏尼特右旗| 富平县| 鹤山市| 米脂县| 柳河县| 莱芜市| 金湖县| 左权县| 焦作市| 绥棱县| 揭阳市| 灵丘县| 南岸区| 白玉县| 馆陶县| 安图县| 百色市| 县级市| 巩义市| 滦南县| 抚宁县| 贺兰县| 烟台市| 湘西| 新营市|