Im working on a regression problem with 400 samples and 7 features, to predict job durations of machineries from historical data. Im using XGboost and (90,10) split works better than (80,20) split. Is this normal? I think im overfitting but I do not know how to properly check this.
(90,10) split Train r2: 0.99 Test r2: 0.96
(80,20)split Train r2: 0.91 Test r2: 0.76
I performed k-fold cross validation (randomized search cv) aswell and for train and test the results were like this:
*Test Set Performance: MSE: 219.16 RMSE: 14.80 MAE: 4.31 R²: 0.78
*Training Set Performance: MSE: 11.18 RMSE: 3.34 MAE: 1.87 R²: 0.99
(I must mention that total duration varies a lot per batches(unique production) but the avg tot duration is 33 hours)