官术网_书友最值得收藏!

Validating/testing

Software engineers are familiar with testing and debugging software source code, but how should ML models be tested? Pieces of algorithms and data input/output routines can be unit tested, but often it is unclear how to ensure that the ML model itself, which presents as a black box, is correct.

The first step to ensuring correctness and sufficient accuracy of an ML model is validation. This means applying the model to predict or classify the validation data subset, and measuring the resulting accuracy against project objectives. Because the training data subset was already seen by the algorithm, it cannot be used to validate correctness, as the model could suffer from poor generalizability (also known as overfitting). To take a nonsensical example, imagine an ML model that consists of a hash map that memorizes each input sample and maps it to the corresponding training output sample. The model would have 100% accuracy on a training data subset, which was previously memorized, but very low accuracy on any data subset, and therefore it would not solve the problem it was intended for. Validation tests against this phenomenon.

In addition, it is a good idea to validate model outputs against user acceptance criteria. For example, if building a recommender system for TV series, you may wish to ensure that the recommendations made to children are never rated PG-13 or higher. Rather than trying to encode this into the model, which will have a non-zero failure rate, it is better to push this constraint into the application itself, because the cost of not enforcing it would be too high. Such constraints and business rules should be captured at the start of the project.

主站蜘蛛池模板: 张北县| 阿勒泰市| 东明县| 宁陵县| 珠海市| 湘阴县| 阜阳市| 奉化市| 霍邱县| 万荣县| 大渡口区| 巨野县| 霍邱县| 喜德县| 铅山县| 舟曲县| 铜梁县| 泸州市| 民丰县| 监利县| 金华市| 华坪县| 湟中县| 迁安市| 澄迈县| 两当县| 雅江县| 尤溪县| 界首市| 牟定县| 韶关市| 兴安盟| 易门县| 蕉岭县| 巴马| 全南县| 宜昌市| 讷河市| 阿拉善右旗| 郯城县| 钟山县|