I am training a CNN to regress on 4 targets related to a given image. Within the image is a point of interest whose position can be defined by phi, and theta (corresponding to x and y of a normal cartesian axis). The targets for my model are sin(phi), cos(phi), sin(theta), and cos(that). I use L2 mean squared error as the loss function for both phi and theta.
The issue I have is that not every image has equal inherent value. Some images have a higher probability of being encountered where as others not so much. This probability scores scale on a metric from 1 -> 1e-100.
My question is how would one incorporate these weights into a loss function so as to make the model better?