1

I'm processing scans of printed pages. The books have drawings, photographs, print (which includes characters but also things like lines and boxes around them), and background (the white page). My goal is to extract each individual drawing from the page. I can ignore the print and photographs.

An existing computer vision algorithm is able to identify axis-aligned bounding boxes for each drawing. These are very accurate, but never include the entire drawing; the boxes usually only cover a small portion of each drawing. My goal is to post-process with a second algorithm to expand the bounding box to include the entire drawing.

What algorithm can do this?

Details, and my work so far, are below:

  1. The drawings have white background, no different than the page.
  2. I assume that each drawing has a white border of background around it. However, this border is not necessarily axis-aligned or even rectangular - it usually is, but not always.
  3. The drawings can have printed characters inside them.
  4. The drawings can have white background portions inside them.

Criteria for my algorithm are, in order:

  • Simplicity of implementation
  • Getting the whole picture
  • Not getting anything besides each picture
  • Performance (of minor concern)

I'd like the algorithm to start with the bounding boxes and return irregular borders.


My draft so far:

for bbox in drawing_bboxes:
    for edge in bbox:
       pixels = convert edge into a set of pixels
       # This is easy, since the edges are axis aligned
       for pixel in pixels:
           if pixel is not a white background color:
               move pixel away from center of bbox by +1
       boundary += pixels

This will return an irregular border, but it will be disconnected and too step. I need to add a step that says "If a pixel is further out than its neighbors, so that it no longer touches those neighbors, push those neighbors out as well, to keep the boundary continuous."

How can this start be completed to a good algorithm? Or, is there a better algorithm?

Ideally, I'd like to find two boundaries: The smallest one that includes the original bounding box and is entirely background. And the largest one that does not intersect any other bounding box, and is entirely background. Each boundary must be continuous, but can be of any shape.

SRobertJames
  • 83
  • 1
  • 8

0 Answers0