0

What is the best practice advice for the following scenario: Before training a linear regression model, if the data used to train the model contains a subset of observations with a target that is already known and not in need of a prediction, what should be done with these?

For example, if the prediction is to estimate a fare amount for a taxi ride that hasn't taken place yet, and some of those rides have a predetermined flat fare (like an airport ride), then the predictions and their residuals would weaken a model's accuracy. Should these flat fare observations:

  1. be removed before training?
  2. be removed before training and then be brought back in with their known rate amount prior to model evaluation?
  3. Or, should they be left in, trained, then adjust predicted values by imputing to the correct flat fare amount then evaluate the model?

Part B: If option #2, then how is this done? This may be made more difficult if the data is scaled.

Thanks in advance.

ssou
  • 13
  • 3

0 Answers0