官术网_书友最值得收藏!

The optimizer and initial learning rate

The Adam optimizer (adaptive moment estimator) is used in training that implements an advanced version of stochastic gradient descent. The Adam optimizer takes care of the curvature in the cost function, and at the same time, it uses momentum to ensure steady progress toward a good local minima. For the problem at hand, since we are using transfer learning and want to use as many of the previously learned features from the pre-trained network as possible, we will use a small initial learning rate of 0.00001. This will ensure that the network doesn't lose the useful features learned by the pre-trained networks, and fine-tunes to an optimal point less aggressively, based on the new data for the problem at hand. The Adam optimizer can be defined as follows:

adam = optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

The beta_1 parameter controls the contribution of the current gradient in the momentum computation, whereas the beta_2 parameter controls the contribution of the square of the gradient in the gradient normalization, which helps to tackle the curvature in the cost function. 

主站蜘蛛池模板: 宜春市| 庄浪县| 镇康县| 鄄城县| 新龙县| 和林格尔县| 井冈山市| 绵竹市| 乌鲁木齐市| 贵州省| 宁陕县| 岑溪市| 京山县| 措美县| 逊克县| 凤山县| 新沂市| 茌平县| 通渭县| 盐亭县| 普定县| 福州市| 孟州市| 都匀市| 社旗县| 钦州市| 永平县| 庆云县| 合肥市| 新民市| 长寿区| 黄石市| 安庆市| 建宁县| 尉氏县| 布拖县| 仪征市| 红原县| 淳安县| 方城县| 广西|