官术网_书友最值得收藏!

Size of the training, development, and test set

Typically, machine learning practitioners choose the size of the three sets in the ratio of 60:20:20 or 70:15:15. However, there is no hard and fast rule that states that the development and test sets should be of equal size. The following diagram shows the different sizes of the training, development, and test sets:

Another example of the three different sets is as follows:

But what about the scenarios where we have big data to deal with? For example, if we have 10,000,000 records or observations, how would we partition the data? In such a scenario, ML practitioners take most of the data for the training set—as much as 98-99%—and the rest gets divided up for the development and test sets. This is done so that the practitioner can take different kinds of scenarios into account. So, even if we have 1% of data for development and the same for the test test, we will end up with 100,000 records each, and that is a good number.

主站蜘蛛池模板: 双峰县| 确山县| 临沂市| 南昌市| 南木林县| 陈巴尔虎旗| 博爱县| 宿松县| 呼图壁县| 虹口区| 盖州市| 福建省| 泰兴市| 枞阳县| 宁陵县| 宝丰县| 阳高县| 通辽市| 筠连县| 姜堰市| 紫阳县| 巨野县| 门头沟区| 兴文县| 吉木萨尔县| 凤冈县| 新竹县| 瓮安县| 江川县| 苏州市| 原阳县| 龙山县| 淳安县| 兴业县| 泾源县| 奉化市| 尉氏县| 池州市| 盐山县| 稷山县| 肥西县|