Cost function for Ordinal Regression using neural networks

Question

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict which product size a customer will order: 'small' (coded as 0), 'medium'(coded as 1), 'large' (coded as 2) or 'extra-large'(coded as 3))? I'm trying to figure out if there are better alternatives than quadratic loss (modeling the problem as an 'vanilla' regression) or cross-entropy loss (modeling the problem as classification).

score 10 · Accepted Answer · edited Jun 13 '24 at 02:06

Another approach was suggested in this paper for face age estimation: Ordinal Regression with Multiple Output CNN for Age Estimation.

The authors use a number of binary classifiers predicting whether a data point is larger than a threshold, and do this for multiple thresholds. I.e. in your case the network would have three binary outputs corresponding to

larger than 0
larger than 1
larger than 2.

For example, for 'large (2)' the ground-truth would be [1 1 0]. The final cost function is a weighted sum of the individual cross-entropy cost functions for each binary classifier.

This has the advantage of inherently weighting larger errors more because more of the individual cost-entropy terms will be violated. Simply doing categorical classification of the ordered outcomes doesn't inherently have this feature.

Cost function for Ordinal Regression using neural networks

1 Answers1

Linked