1

I am trying to do a prediction of real estate (prices are in millions).

The mean price for the dataset is 4 million.

I do not have any negative values in my dataset, but there are predicted values which are negative like -10 million.

Xgboost is also predicting negative values:

Xgboost: RMSE is 1.24 and R$^2$ is 0.81

Linear regression: RMSE is 1.54 and R$^2$ 0.74

What am I doing wrong? I tried to use $\log(\text{price})$ but the RMSE is bigger. What solutions can be found for this type of problem?

Djakarta_zero
  • 21
  • 1
  • 6

1 Answers1

1

This can happen with regression, especially if the training data is too small and/or the test data has important differences with the training data. It can be caused by bias or overfitting, but it's more likely overfitting in your case so the solution is either to improve the training data or to simplify the model, for example by removing some features.

Erwan
  • 26,519
  • 3
  • 16
  • 39