2

I am aware that YOLO (v1-5) is a real-time object detection model with moderately good overall prediction performance. I know that UNet and variants are efficient semantic segmentation models that are also fast and have good prediction performance.

I cannot find any resources comparing the inference speed differences between these two approaches. It seems to me that semantic segmentation is clearly a more difficult problem, to classify each pixel in an image, than object detection, drawing bounding boxes around objects in the image.

Does anyone have good resources for this comparison? Or a very good explanation to why one is computationally more demanding that the other?

JStrahl
  • 161
  • 4

0 Answers0