They don't represent true probability because you'd still have to calibrate your model.
Let's imagine you're trying to classify cats and dogs in a given set of images (binary classification problem, with 0 for cats and 1 for dogs). Now let's say you have a batch of 10 images as the input of your model. If, for all 10 samples, your model assigns a probability of 0.6 that they are the positive class, then you'd expect (for a well-calibrated model) to find 4 cat images and 6 dog images in your batch. After all, that's what a probability means.
Using a sigmoid won't fix your calibration problem, as your model won't even have a probability distribution as its output. You'd need to use an extra data split (if you can afford to use one) and use calibration curves to evaluate your model's calibration. Scikit-learn has a pretty insightful read on the matter.
Some other references as well, these are specific to neural networks: