官术网_书友最值得收藏!

Regularization on linear models

The Stochastic Gradient Descent algorithm (SGD) finds the optimal weights {wi} of the model by minimizing the error between the true and the predicted values on the N training samples:

Where  are the predicted values, ?i the real values to be predicted; we have N samples, and each sample has n dimensions.

Regularization consists of adding a term to the previous equation and to minimize the regularized error:

The  parameter helps quantify the amount of regularization, while R(w) is the regularization term dependent on the regression coefficients.

There are two types of weight constraints usually considered:

  • L2 regularization as the sum of the squares of the coefficients:
  • L1 regularization as the sum of the absolute value of the coefficients:

The constraint on the coefficients introduced by the regularization term R(w) prevents the model from overfitting the training data. The coefficients become tied together by the regularization and can no longer be tightly leashed to the predictors. Each type of regularization has its characteristic and gives rise to different variations on the SGD algorithm, which we now introduce:

主站蜘蛛池模板: 丹寨县| 九台市| 池州市| 宾川县| 开阳县| 大足县| 姚安县| 娄底市| 瓦房店市| 扎鲁特旗| 靖西县| 兴业县| 临沂市| 东乌珠穆沁旗| 余江县| 读书| 大埔县| 来安县| 加查县| 黑龙江省| 通化市| 临朐县| 福贡县| 道孚县| 嘉定区| 绵阳市| 祁阳县| 刚察县| 北京市| 宽城| 扬中市| 桑植县| 从化市| 当涂县| 利川市| 诸城市| 庐江县| 盘山县| 土默特左旗| 湄潭县| 邵东县|