12

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict which product size a customer will order: 'small' (coded as 0), 'medium'(coded as 1), 'large' (coded as 2) or 'extra-large'(coded as 3))? I'm trying to figure out if there are better alternatives than quadratic loss (modeling the problem as an 'vanilla' regression) or cross-entropy loss (modeling the problem as classification).

Royi
  • 219
  • 1
  • 7
xboard
  • 388
  • 3
  • 14

1 Answers1

10

Another approach was suggested in this paper for face age estimation: Ordinal Regression with Multiple Output CNN for Age Estimation.

The authors use a number of binary classifiers predicting whether a data point is larger than a threshold, and do this for multiple thresholds. I.e. in your case the network would have three binary outputs corresponding to

  • larger than 0
  • larger than 1
  • larger than 2.

For example, for 'large (2)' the ground-truth would be [1 1 0]. The final cost function is a weighted sum of the individual cross-entropy cost functions for each binary classifier.

This has the advantage of inherently weighting larger errors more because more of the individual cost-entropy terms will be violated. Simply doing categorical classification of the ordered outcomes doesn't inherently have this feature.

Royi
  • 219
  • 1
  • 7
Chrigi
  • 241
  • 2
  • 4