書名： Mastering Machine Learning Algorithms
作者名： Giuseppe Bonaccorso
本章字數： 173字
更新時間： 2021-06-25 22:07:28

Categorical cross-entropy

Categorical cross-entropy is the most diffused classification cost function, adopted by logistic regression and the majority of neural architectures. The generic analytical expression is:

This cost function is convex and can be easily optimized using stochastic gradient descent techniques; moreover, it has another important interpretation. If we are training a classifier, our goal is to create a model whose distribution is as similar as possible to pdata. This condition can be achieved by minimizing the Kullback-Leibler pergence between the two distributions:

In the previous expression, p_M is the distribution generated by the model. Now, if we rewrite the pergence, we get:

The first term is the entropy of the data-generating distribution, and it doesn't depend on the model parameters, while the second one is the cross-entropy. Therefore, if we minimize the cross-entropy, we also minimize the Kullback-Leibler pergence, forcing the model to reproduce a distribution that is very similar to p_data. This is a very elegant explanation as to why the cross-entropy cost function is an excellent choice for classification problems.

官术网_书友最值得收藏!

Mastering Machine Learning Algorithms

Categorical cross-entropy