Using resnet50 (torchvision.models pretrained=False) with an input of [15, 224,224] which includes 14 heatmaps and a level set map, per datapoint. The goal is to predict a cutoff value for each datapoint, this value is used for heatmaps suppression.
The dataset has 12K samples with the following histogram of the target cutoff values (mean: 0.01455, std.: 0.02109):

The resnet model with modified input (2D convoltion) and output (Linear trans.) to accommodate the task does not seem to be learning, using MSELoss and Adam optimizer with learning rate of 1e-06.
What is the cause of the model not learning? is it the distribution of the dataset, the small interval for prediction? the model is not adequate?