I used some machine learning algorithms in my dataset and I found that my RMSE goes low than the MAE. What are the most common reasons for that type of typical scenario. Since from my understanding the RMSE is normally higher than the MAE. But if I am wrong is it actually possible to have a lower RMSE and higher MAE? (for example: RMSE: 26, and MAE : 36)
Asked
Active
Viewed 439 times
1 Answers
2
Yes, you are correct that in general $RMSE(x) \geq MAE(x)$ holds (see this answer for a good explanation of the different error measures and this paper for an interesting comparison of the two measures).
Therefore, your case must stem from other sources:
Randomness in models: the models you have applied are non-deterministic, e.g. your random forest randomly sub-samples features. This could either be solved by fixing all random parameters or measure RMSE and MAE on the same trained model:
model.fit(X_train)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
mae = MAE(y_test, y_pred)
instead of
model.fit(X_train)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
model.fit(X_train)
y_pred = model.predict(X_test)
mae = MAE(y_test, y_pred)
This answer shows a way to fix random seeds.
Randomness in data splits: Data splits are another source of randomness. So you either either make the split deterministic or measure both errors on the same split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
mae = MAE(y_test, y_pred)
instead of
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
rmse = RMSE(y_test, y_pred)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7)
y_pred = model.predict(X_test)
mae = MAE(y_test, y_pred)
Jonathan
- 5,605
- 1
- 11
- 23