I have some database, containing numerical data about products.
I use different models for predicting the value of a feature, e.g. batttery capacity of a laptop, given other features, such as size, CPU core count, etc.
The models, among those also simple linear regression, predict a value given a set of features on a test set. At the same time, an empirical study with human participants are supposed to read different product reviews with the feature value, which we are interested in, masked out. Participants' task is to estimate a lower and upper border of values they expect given the product review.
My question now is how to best compare the participants' guesses of the borders the value should fall into with the value the regression model has predicted. What would be a statistically sound evaluation? I thought about mean absolute distance and Cohen's d, but I am unsure since I could not find any publication that does something comparable.