AUC on ROC Curve near 1.0 for Multi-Class CNN but Precision/Recall are not perfect?

Question

I am building a ROC Curve and calculating AUC for multi-class classification on the CIFAR-10 dataset using a CNN. My overall Accuracy is ~ 90% and my precision and recall are as follows:

              precision    recall  f1-score   support
airplane       0.93      0.90      0.91      1000

automobile       0.93      0.96      0.95      1000
        bird       0.88      0.87      0.87      1000
         cat       0.86      0.72      0.79      1000
        deer       0.88      0.91      0.89      1000
         dog       0.88      0.81      0.84      1000
        frog       0.83      0.97      0.89      1000
       horse       0.94      0.94      0.94      1000
        ship       0.95      0.93      0.94      1000
       truck       0.90      0.95      0.92      1000
accuracy                           0.90     10000

macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000

The code where I calculate the ROC Curve and AUC is below:

def assess_model_from_pb(model_file_path: Path, xtest: np.ndarray, ytest: np.ndarray, save_plot_path: Path):
class_labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']
model = load_model(model_file_path) # load model from filepath
feature_extractor = Model(inputs = model.inputs, outputs = model.get_layer('dense').output) # extract dense output layer (will be softmax probabilities)
y_score = feature_extractor.predict(xtest, batch_size = 64) # one hot encoded softmax predictions
ytest_binary = label_binarize(ytest, classes = [0,1,2,3,4,5,6,7,8,9]) # one hot encode the test data true labels
n_classes = y_score.shape[2]

fpr = dict()
tpr = dict()
roc_auc = dict() 
# compute fpr and tpr with roc_curve from the ytest true labels to the scores
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(ytest_binary[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# plot each class  curve on single graph for multi-class one vs all classification
colors = cycle(['blue', 'red', 'green', 'brown', 'purple', 'pink', 'orange', 'black', 'yellow', 'cyan'])
for i, color, lbl in zip(range(n_classes), colors, class_labels):
    plt.plot(fpr[i], tpr[i], color = color, lw = 1.5,
    label = 'ROC Curve of class {0} (area = {1:0.3f})'.format(lbl, roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw = 1.5)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for CIFAR-10 Multi-Class Data')
plt.legend(loc = 'lower right', prop = {'size': 6})
fullpath = save_plot_path.joinpath(save_plot_path.stem +'_roc_curve.png')
plt.savefig(fullpath)
plt.show()

I suppose I am just confused on how my AUC can be near 1 when my precision and recalls are not perfect. I understand that many thresholds are used to determine what is a positive class and a negative class. For example towards the beginning of the curve, if the threshold is very high (say around .99999), how is it that my tpr is near 1? Is it solely the fact that at that threshold I will only be giving positive classifications for the most absolutely high softmax probabilities?

Would just like a bit more explanation or intuition on the topic to make sure I am not doing something incorrectly.

AUC on ROC Curve near 1.0 for Multi-Class CNN but Precision/Recall are not perfect?

0 Answers0