官术网_书友最值得收藏!

Assessment

When a data scientist evaluates a model or data science process for performance, this is referred to as assessment. Performance can be defined in several ways, including the model's growth of learning or the model's ability to improve (with) learning (to obtain a better score) with additional experience (for example, more rounds of training with additional samples of data) or accuracy of its results.

One popular method of assessing a model or processes performance is called bootstrap sampling. This method examines performance on certain subsets of data, repeatedly generating results that can be used to calculate an estimate of accuracy (performance).

The bootstrap sampling method takes a random sample of data, splits it into three files--a training file, a testing file, and a validation file. The model or process logic is developed based on the data in the training file and then evaluated (or tested) using the testing file. This tune and then test process is repeated until the data scientist is comfortable with the results of the tests. At that point, the model or process is again tested, this time using the validation file, and the results should provide a true indication of how it will perform.

You can imagine using the bootstrap sampling method to develop program logic by analyzing test data to determine logic flows and then running (or testing) your logic against the test data file. Once you are satisfied that your logic handles all of the conditions and exceptions found in your testing data, you can run a final test on a new, never-before-seen data file for a final validation test.
主站蜘蛛池模板: 深州市| 霍邱县| 临朐县| 巴彦县| 黎城县| 乐至县| 西昌市| 汝城县| 兰西县| 隆林| 调兵山市| 贵定县| 得荣县| 科尔| 安泽县| 揭东县| 手机| 广德县| 密云县| 丹棱县| 东乡族自治县| 托克托县| 独山县| 吉隆县| 濉溪县| 伊川县| 高陵县| 怀宁县| 兴隆县| 沛县| 朝阳区| 文安县| 新河县| 吉木乃县| 象山县| 榆林市| 鲜城| 高青县| 公主岭市| 娄烦县| 洛扎县|