官术网_书友最值得收藏!

L1 regularization and Lasso

L1 regularization usually entails some loss of predictive power of the model. 

One of the properties of L1 regularization is to force the smallest weights to 0 and thereby reduce the number of features taken into account in the model. This is a desired behavior when the number of features (n) is large compared to the number of samples (N). L1 is better suited for datasets with many features. 

The Stochastic Gradient Descent algorithm with L1 regularization is known as the Least Absolute Shrinkage and Selection Operator (Lasso) algorithm.

In both cases the hyper-parameters of the model are as follows:

  • The learning rate  of the SGD algorithm
  • A parameter  to tune the amount of regularization added to the model

A third type of regularization called ElasticNet consists in adding both a L2 and a L1 regularization term to the model. This brings up the best of both regularization schemas at the expense of an extra hyper-parameter.

In other contexts, although experts have different opinions (https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization) on which type of regularization is more effective, the consensus seems to favor L2 over L1 regularization.

L2 and L1 regularization are both available in Amazon ML while ElasticNet is not. The amount of regularization available is limited to three values for  : mild (10-6), medium (10-4), and aggressive (10-2).

主站蜘蛛池模板: 阿克陶县| 临海市| 肃北| 微博| 冷水江市| 黎川县| 乌拉特前旗| 奉化市| 文昌市| 阿拉善右旗| 永寿县| 台州市| 天门市| 柞水县| 方正县| 临澧县| 平安县| 柳河县| 宜阳县| 南木林县| 双峰县| 民丰县| 新巴尔虎左旗| 郯城县| 枣阳市| 德令哈市| 南京市| 明溪县| 乌鲁木齐县| 海伦市| 体育| 盐城市| 清苑县| 甘孜| 黔南| 博客| 宜丰县| 绥化市| 泽州县| 同仁县| 五华县|