You didn’t tell us about the use case or business domain for your problem. For example, if you were modeling battery energy consumption in noise canceling headphones, root mean squared error would be a natural loss function for your model;
it falls out of the power equations.
Figure out what matters to the business. Write it down. Then pick a loss function that steers the model in the direction the business cares about.
All that you told us about the problem you’re solving is that it involves a “non-linear” dataset. It’s not obvious that RMSE is a natural
measure for your problem.
You did not describe nonlinear equations of motion, or other relevant description of the situation you’re examining.
Often it can be convenient to precondition inputs with a nonlinear transform.
For example, if you were looking at impact velocity or crater size in some trebuchet observations, SQRT might be a natural fit.
If you’re looking at market cap of firms in a given vertical, or home prices, or salaries, distributions will skew toward large figures, and a LOG transform may prove useful for taming that long tail.
In the problems I’ve worked on, interpreting root mean square error has never been an issue. Usually it corresponds to heat load dissipated by a resistor, or size of a power supply in a control system. If error magnitude suggests we could exceed rated load of the component, then we look for another model solution, or buy a bigger component.
Clearly we can rank order models by RMSE. But if that’s not interpretable enough for your use case, and there’s not some obvious feature in the input space that suggests a ratio against the error output, then maybe RMSE isn’t appropriate to the problem. Adopting a metric “because everyone else is using it” doesn’t sound like a principled approach.
predict housing prices .... What would be an appropriate loss function?
Well now it's pretty obvious what matters to the business.
For a large firm it's just profit, or capital at risk.
So MAE,
possibly with asymmetric skewing so a "loss" surprise weighs
more heavily than a windfall "profit" surprise.
Sounds like a pretty linear measure.
Predicting the error bars around an estimate might be more
valuable than the actual estimate.
In the face of large uncertainty, choose not to transact.
For a small firm, existential risk ("we can't make payroll next month")
may be the more interesting measure.
It's very non-linear, either we're in business next month or we aren't.
So training a model to identify low variance predicted transactions
could be the focus.
High recall might not matter if the market is large enough to be choosy,
as long as we have fairly high precision on the deals we choose to participate in.
A loss function like RMSE can be helpful here,
in the sense that it discourages large errors more than MAE would.
I can't offer a principled theory for why we should square such dollar errors,
instead of, say, cubing them.
If we're going for "time value of money", then maybe EXP plays nicely
with compound interest and with the opportunity cost of alternative
investments we didn't make.
I have worked with models of house price and of propensity for owner to sell.
I can tell you that nailing it within one standard deviation of ± $5k,
in the U.S. market, is essentially impossible. There's a lot of things
happening in the market, not all of them observable by a model.