What could be the reason of having a lower RMSE than MAE?

Question

I used some machine learning algorithms in my dataset and I found that my RMSE goes low than the MAE. What are the most common reasons for that type of typical scenario. Since from my understanding the RMSE is normally higher than the MAE. But if I am wrong is it actually possible to have a lower RMSE and higher MAE? (for example: RMSE: 26, and MAE : 36)

score 2 · Accepted Answer · answered Mar 14 '20 at 09:26

Yes, you are correct that in general $RMSE(x) \geq MAE(x)$ holds (see this answer for a good explanation of the different error measures and this paper for an interesting comparison of the two measures).

Therefore, your case must stem from other sources:

Randomness in models: the models you have applied are non-deterministic, e.g. your random forest randomly sub-samples features. This could either be solved by fixing all random parameters or measure RMSE and MAE on the same trained model:

model.fit(X_train)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
mae = MAE(y_test, y_pred)

instead of

model.fit(X_train)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)

model.fit(X_train)
y_pred = model.predict(X_test)
mae = MAE(y_test, y_pred)

This answer shows a way to fix random seeds.

Randomness in data splits: Data splits are another source of randomness. So you either either make the split deterministic or measure both errors on the same split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
mae = MAE(y_test, y_pred)

instead of

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
mae = MAE(y_test, y_pred)

What could be the reason of having a lower RMSE than MAE?

1 Answers1