官术网_书友最值得收藏!

Compensating factors in machine learning models

Compensating factors in machine learning models to equate statistical diagnostics is explained with the example of a beam being supported by two supports. If one of the supports doesn't exist, the beam will eventually fall down by moving out of balance. A similar analogy is applied for comparing statistical modeling and machine learning methodologies here.

The two-point validation is performed on the statistical modeling methodology on training data using overall model accuracy and individual parameters significance test. Due to the fact that either linear or logistic regression has less variance by shape of the model itself, hence there would be very little chance of it working worse on unseen data. Hence, during deployment, these models do not incur too many deviated results.

However, in the machine learning space, models have a high degree of flexibility which can change from simple to highly complex. On top, statistical diagnostics on individual variables are not performed in machine learning. Hence, it is important to ensure the robustness to avoid overfitting of the models, which will ensure its usability during the implementation phase to ensure correct usage on unseen data.

As mentioned previously, in machine learning, data will be split into three parts (train data - 50 percent, validation data - 25 percent, testing data - 25 percent) rather than two parts in statistical methodology. Machine learning models should be developed on training data, and its hyperparameters should be tuned based on validation data to ensure the two-point validation equivalence; this way, the robustness of models is ensured without diagnostics performed at an individual variable level:

Before diving deep into comparisons between both streams, we will start understanding the fundamentals of each model individually. Let us start with linear regression! This model might sound trivial; however, knowing the linear regression working principles will create a foundation for more advanced statistical and machine learning models. Below are the assumptions of linear regression.

主站蜘蛛池模板: 邵武市| 石景山区| 凤山县| 繁峙县| 三原县| 津南区| 武强县| 鄂托克前旗| 左云县| 靖宇县| 霍林郭勒市| 珲春市| 休宁县| 贡山| 汉寿县| 双鸭山市| 丰顺县| 镇雄县| 灵台县| 灵宝市| 绥中县| 阿拉善盟| 射洪县| 莲花县| 阳东县| 启东市| 桐柏县| 明星| 三门县| 疏勒县| 远安县| 厦门市| 南投县| 曲阜市| 桦川县| 隆昌县| 抚顺县| 陆良县| 沾化县| 彰化市| 建始县|