Why is Precision-Recall AUC different from Average Precision score?

Question

I have been calculating the area under the Precision-Recall curve (AUPRC) using the code snippet below:

from sklearn import metrics
precision, recall, threshold = metrics.precision_recall_curve(y_true, y_pred)
prec_rec_auc_score = metrics.auc(recall, precision)

and the Average Precision (AP) by using the code below:

from sklearn import metrics
avg_precision_score = metrics.average_precision_score(y_true, y_pred)

The two scores have usually been exactly the same, however, I recently came across a situation where the area under the Precision-Recall curve (AUPRC) was significantly larger than the Average Precision (AP) score. Is this possible? And why would this happen?

score 2 · Answer 1 · answered Jan 26 '24 at 04:55

I have tried to generate a dataset with a 95:5 class imbalance, train a RandomForestClassifier model, and calculate AUPRC and AUC-ROC and Average Precision (AP) scores for the binary classification task:

My observations show that $AP$ is always slightly greater than $AUPRC$: $$AUPRC<AP<<AUCROC$$

There could be a chance that $AUPRC>=AP$ which could be reasoned in the interpolation method. scikit-learn doesn't explicitly document the interpolation method used in the precision_recall_curve and average_precision_score functions.

From the scikit-learn documentation states that:

average_precision_score function calculates the area under the precision-recall curve using the trapezoidal rule.

However, it doesn't explicitly mention the interpolation method used for precision-recall points. ref.

precision_recall_curve computes precision-recall pairs for different probability thresholds and uses linear interpolation to estimate precision values at different recall levels.

Python code:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_recall_curve, auc, average_precision_score, roc_curve, roc_auc_score
Generate imbalanced data with labels for positive (1) and negative (0) classes
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2,
                           weights=[0.95, 0.05], random_state=42)
Count the number of observations for each class
num_positive_class = np.sum(y == 1)
num_negative_class = np.sum(y == 0)
Scatter plot for binary class distribution
plt.figure(figsize=(8, 8))
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', edgecolors='k', label=f'Positive Class (1): {num_positive_class}')
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='red', edgecolors='k', label=f'Negative Class (0): {num_negative_class}')
plt.title('Binary Class Distribution')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend(loc='best')
plt.show()
Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Train a Random Forest classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
Predict probabilities on the test set
y_proba = model.predict_proba(X_test)[:, 1]
Calculate precision-recall curve
precision, recall, _ = precision_recall_curve(y_test, y_proba)
Calculate AUPRC
auprc = auc(recall, precision)
Calculate ROC curve
fpr, tpr, _ = roc_curve(y_test, y_proba)
Calculate AUC-ROC
roc_auc = roc_auc_score(y_test, y_proba)
Calculate Average Precision (AP) for the Random Forest model
ap_random_forest = average_precision_score(y_test, y_proba)
Calculate chance level AP
ap_chance_level = np.sum(y_test) / len(y_test)
Plot AUPRC, AUC-ROC, and AP in subplot 1x3
plt.figure(figsize=(18, 6))
Plot AUPRC
plt.subplot(1, 3, 1)
plt.plot(recall, precision, label=f'Random Forest Model (AP={ap_random_forest:.4f})', color='orange')
plt.xlabel('Recall (Positive class: 1)')
plt.ylabel('Precision (Positive class: 1)')
plt.title('Precision-Recall Curve')
plt.axhline(y=ap_chance_level, color='red', linestyle='--', label=f'Chance Level (AP={ap_chance_level:.4f})')
plt.legend(loc='best')
Plot AUC-ROC
plt.subplot(1, 3, 2)
plt.plot(fpr, tpr, label=f'AUC-ROC = {roc_auc:.4f}', color='green')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='best')
Plot AP with annotation
plt.subplot(1, 3, 3)
bars = plt.bar([0, 1, 2], [auprc, roc_auc, ap_random_forest], tick_label=['AUPRC', 'AUC-ROC', 'AP'], color=['orange', 'green', 'blue'])
plt.ylim(0, 1)
plt.title('AUPRC, AUC-ROC, and Average Precision (AP)')
Annotate values on top of bars with increased font size
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, round(yval, 4), ha='center', va='bottom', fontsize=12)
plt.legend(loc='best')
plt.tight_layout()
plt.show()

I could not argue further but I also used LogisticRegression and changed the imbalance rate but the results didn't change and $AUPRC<AP$ (slightly).

Why is Precision-Recall AUC different from Average Precision score?

1 Answers1

Generate imbalanced data with labels for positive (1) and negative (0) classes

Count the number of observations for each class

Scatter plot for binary class distribution

Split the data into training and testing sets

Train a Random Forest classifier

Predict probabilities on the test set

Calculate precision-recall curve

Calculate AUPRC

Calculate ROC curve

Calculate AUC-ROC

Calculate Average Precision (AP) for the Random Forest model

Calculate chance level AP

Plot AUPRC, AUC-ROC, and AP in subplot 1x3

Plot AUPRC

Plot AUC-ROC

Plot AP with annotation

Annotate values on top of bars with increased font size