官术网_书友最值得收藏!

Formulating the loss function

The data for this use case has five classes, pertaining to no diabetic retinopathy, mild diabetic retinopathy, moderate diabetic retinopathy, severe diabetic retinopathy, and proliferative diabetic retinopathy. Hence, we can treat this as a categorical classification problem. For our categorical classification problem, the output labels need to be one-hot encoded, as shown here:

  • No diabetic retinopathy: [1 0 0 0 0]T 
  • Mild diabetic retinopathy: [0 1 0 0 0]T
  • Moderate diabetic retinopathy[0 0 1 0 0]T
  • Severe diabetic retinopathy: [0 0 0 1 0]T
  • Proliferative diabetic retinopathy: [0 0 0 0 1]T

Softmax would be the best activation function for presenting the probability of the different classes in the output layer, while the sum of the categorical cross-entropy loss of each of the data points would be the best loss to optimize. For a single data point with the output label vector y and the predicted probability of p, the cross-entropy loss is given by the following equation:

           

Here,  and .

Similarly, the average loss over M training data points can be represented as follows:

                                           

During the training process, the gradients of a mini batch are based on the average log loss given by (2), where M is the chosen batch size. For the validation log loss that we will monitor in conjunction with the validation accuracy, is the number of validation set data points. Since we will be doing K-fold cross-validation in each fold, we will have a different validation dataset in each fold.

Now that we have defined the training methodology, the loss function, and the validation metric, let's proceed to the data exploration and modeling. 

Note that the classifications in the output classes are of an ordinal nature, since the severity increases from class to class. For this reason, regression might come in handy. We will try our luck with regression in place of categorical classification, as well, to see how it fares. One of the challenges with regression is to convert the raw scores to classes. We would use a simple scheme and hash the scores to its nearest integer severity class.

主站蜘蛛池模板: 普安县| 南宁市| 巴中市| 宁安市| 汤原县| 昌平区| 精河县| 阳春市| 共和县| 乐陵市| 临湘市| 彩票| 蒲江县| 涟水县| 芦山县| 邓州市| 保山市| 杨浦区| 平谷区| 乌鲁木齐县| 浦城县| 祁东县| 洪江市| 余庆县| 漾濞| 汪清县| 永寿县| 宽城| 岐山县| 桃园县| 英吉沙县| 靖西县| 二连浩特市| 浦北县| 扎囊县| 张家港市| 方山县| 天水市| 嘉义市| 翁牛特旗| 平凉市|