Find missing object(s) in image with a priori knowledge about the missing object(s) (w.r.t base image)

Question

Problem Statement:

I am working on developing a method, or borrow/modify/combine existing ones, where given an golden image (reference or base with all expected objects to be present), it is able to identify the missing objects and draw a bounding box in the expected area, when images are not exactly same dimension (there exists subtle differences in the field of view). It is noted that, like the example given below, I do have a priori knowledge about the objects if that changes anything. Despite rather seemed a trivial task, it turns out to be a difficult one when images have slight difference in the size or field of view, despite being quite similar and vividly distinguishable by human.

[Disclaimer] This post intents to share all variations I have developed so far (for those who are interested), and in fact demonstrates somewhat a desirable achievement specially the approach that is showed last, and yet seeks for further improvements or suggestions.

Experiemental Approaches:

Initially I sought of solving the problem using standard Object Detection using one of the commonly used Tensorflow transfer learning models. But immediately I realized I wouldn't be able to identify the missing objects. All I could have using such model was to have list of expected objects, and if I get lucky and my object detector works very well, I cross check the identified ones in the list and highlight in red the missing ones. Yet I would not know where the missing objects are to be expected.

Afterwards I came across others methods offered by the community over the last decade:

However each single of them having their down drawbacks, at least for my problem at hand.

To make the scenario more concrete, let's take the following images as an example. On the left, I have the base image, where as on the right is the one with missing objects (in this case the red square on top, orange circle on the bottom left, and green square somewhere bellow the middle line are missing):

1. Element-wise or pixel-wise absolute difference:

Simplest of all is the element-wise or pixel-wise absolute difference abs(image_base – current_image), which is a pixel-by-pixel comparison. Although I was optimistic that it may work and be enough. In fact, it does a decent job, as long as your compared_to_be_image has an exact same size and is captured in the same field of view. Slight changes causes huge differences (absolutely expected but not desirable):

import os 
import cv2
import numpy as np
from image_tools.sizes import resize_and_crop
path_to_test = r"path\to\iamges"
image1 = "base.jpg"
image2 = "base_missing.jpg"
def findMissingObj(image1_base, image2_to_be_compared):
# load the two input images
imageA = cv2.imread(os.path.join(path_to_test, image1_base)) 
# Expected size (image1_base)
size = (imageA.shape[1], imageA.shape[0])

# Resize and Crop the image2_to_be_compared matching image1_base
imageB = np.array(resize_and_crop(os.path.join(path_to_test, image2_to_be_compared), size, &quot;middle&quot;))
imageB = np.array(imageB[...,::-1])

# convert the images to grayscale
grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY)
grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY)

# compute difference
difference = abs(grayA -grayB)

name =  'absDiff_' + image2.split('.')[0] + '_VS_' + image1.split('.')[0] + '.jpg'
cv2.imwrite(os.path.join(path_to_test, name),difference)

The left image is when current_image is exactly the same as image_base but certain objects are missing, and it returns a very nice result. The right one, is when the current_image is slightly cropped from sides. Obviously both images should have a same dimension, otherwise it wouldn't work. Here I experimented various ways to resize, pad the current_image to match the dimension of image_base (here I am using resize_and_crop from image_tools python package to achieve that), afterwards did the pixel-wise absolute difference. This is not obviously desirable.

2. Scale-invariant feature transform:

Also Scale-invariant feature transform was offered in one of the posts that performs perform feature matching based point of interests, and is already implemented in OpenCV:

import os
import cv2
path_to_test = r"path\to\iamges"
image1 = "base.jpg"
image2 = "base_missing.jpg"
def findMissingObj(image1_base, image2_to_be_compared):
# read images
img1 = cv2.imread(os.path.join(path_to_test, image1_base)) 
img2 = cv2.imread(os.path.join(path_to_test, image2_to_be_compared))

img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

#sift
sift = cv2.SIFT_create()

keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None)

#feature matching
bf = cv2.BFMatcher(cv2.NORM_L1, crossCheck=True)

matches = bf.match(descriptors_1,descriptors_2)
matches = sorted(matches, key = lambda x:x.distance)

img3 = cv2.drawMatches(img1, keypoints_1, img2, keypoints_2, matches[:50], img2, flags=2)

# Write output images
name = 'SIFT_' + image2.split('.')[0] + '_VS_' + image1.split('.')[0]  + '.jpg'
cv2.imwrite(os.path.join(path_to_test, name),img3)

Results are self-explanatory. Top one is when current_image is exactly the same as image_base, while bottom one current_image is slightly cropped from sides. To be honest, I am not sure how either would help figuring the missing objects out! This is more like Template Matching, where templates of objects in various forms or orientations exists, and one wants to match, then indeed SIFT helps locate the local features in an image, commonly known as the keypoints, right examples are [this tutorial9, or this answer or this blogpost.

3. Structural Similarity Index (SSIM):

Then there is a method named Structural Similarity Index (SSIM) in OpenCV, that seemingly could do the job, as it was shows in Pyimagesearch tutorial as well:

import os 
import cv2
import numpy as np
from skimage.measure import compare_ssim
from image_tools.sizes import resize_and_crop
path_to_test = r"path\to\iamges"
image1 = "base.jpg"
image2 = "base_missing.jpg"
def findMissingObj(image1_base, image2_to_be_compared):
# load the two input images
imageA = cv2.imread(os.path.join(path_to_test, image1_base)) 
#Image.open(os.path.join(path_to_test, image1_base))

# Expected size
size = (imageA.shape[1], imageA.shape[0])

imageB = np.array(resize_and_crop(os.path.join(path_to_test, image2_to_be_compared), size, &quot;middle&quot;))
imageB = np.array(imageB[...,::-1])

# convert the images to grayscale
grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY)
grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY)

# compute the Structural Similarity Index (SSIM) between the two
# images, ensuring that the difference image is returned
(score, diff) = compare_ssim(grayA, grayB, full=True)
diff = (diff * 255).astype(&quot;uint8&quot;)
print(&quot;SSIM: {}&quot;.format(score))

# threshold the difference image, followed by finding contours to
# obtain the regions of the two input images that differ
thresh = cv2.threshold(diff, 0, 255,
    cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)

# loop over the contours
for c in cnts:
    # compute the bounding box of the contour and then draw the
    # bounding box on both input images to represent where the two images differ
    if cv2.contourArea(c) &gt; cv2.arcLength(c, True):
        (x, y, w, h) = cv2.boundingRect(c)
        cv2.rectangle(imageA, (x, y), (x + w, y + h), (0, 0, 255), 2)
        cv2.rectangle(imageB, (x, y), (x + w, y + h), (0, 0, 255), 4)

# Write output images
name =  'SSIM_' + image2.split('.')[0] + '_VS_' + image1.split('.')[0] + '.jpg'
cv2.imwrite(os.path.join(path_to_test, name),imageB)

As before the left image is when current_image is exactly the same as image_base but certain objects are missing, and it returns a very nice result. The right one, however, is when the current_image is slightly cropped from sides. Unfortunately, this algorithm fails to realize those subtle differences as well, return a lot of non sense bounding boxes.

4. Structural Similarity Index (SSIM) with TransformECC: As you have seen, one major problem is that all algorithms fails to align the current_image, when it is titled or cropped (slightly different dimensions), to the image_base! After days of searching, I found out that TransformECC algorithm, of course again in OpenVC, finds the geometric transform (warp) between two images in terms of the ECC criterion, and align them as much as it is possible, read Image Alignment (ECC) in OpenCV for an extensive tutorial. Here, I am perform TransformECC first, then followed by SSIM algorithm, and only plot close-contours (otherwise it can quite noisy too), code:

import os 
import cv2
import numpy as np
from skimage.measure import compare_ssim
from image_tools.sizes import resize_and_crop
path_to_test = r"path\to\iamges"
image1 = "base.jpg"
image2 = "base_missing.jpg"
def findMissingObj(image1_base, image2_to_be_compared):
#  load the two input images
imageA = cv2.imread(os.path.join(path_to_test, image1_base)) 
# Expected size
size = (imageA.shape[1], imageA.shape[0])

imageB = np.array(resize_and_crop(os.path.join(path_to_test, image2_to_be_compared), size, &quot;middle&quot;))
imageB = np.array(imageB[...,::-1])

# convert the images to grayscale
grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY)
grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY)

warp_mode = cv2.MOTION_AFFINE
warp_matrix = np.eye(2, 3, dtype=np.float32)

# Specify the number of iterations.
number_of_iterations = 100

# Specify the threshold of the increment in the correlation 
# coefficient between two iterations
termination_eps = 1e-7

criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 
        number_of_iterations, termination_eps)

# Run the ECC algorithm. The results are stored in warp_matrix.
(cc, warp_matrix) = cv2.findTransformECC(grayA, grayB, warp_matrix, 
                                        warp_mode, criteria, None, 1)

# Get the target size from the desired image
target_shape = grayA.shape

aligned_fit_and_resized_grayB = cv2.warpAffine(
                        grayB, 
                        warp_matrix, 
                        (target_shape[1], target_shape[0]), 
                        flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP,
                        borderMode=cv2.BORDER_CONSTANT, 
                        borderValue=0)


print('aligned_fit_and_resized_grayB', aligned_fit_and_resized_grayB.shape)
# compute the Structural Similarity Index (SSIM) between the two
# images, ensuring that the difference image is returned
(score, diff) = compare_ssim(grayA, aligned_fit_and_resized_grayB, full=True)
diff = (diff * 255).astype(&quot;uint8&quot;)
print(&quot;SSIM: {}&quot;.format(score))

# threshold the difference image, followed by finding contours to
# obtain the regions of the two input images that differ
thresh = cv2.threshold(diff, 0, 255,
    cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)

# loop over the contours
for c in cnts:
    # compute the bounding box of the contour and then draw the
    # bounding box on both input images to represent where the two images differ
    if cv2.contourArea(c) &gt; cv2.arcLength(c, True):
        (x, y, w, h) = cv2.boundingRect(c)
        cv2.rectangle(imageA, (x, y), (x + w, y + h), (0, 0, 255), 2)
        cv2.rectangle(imageB, (x, y), (x + w, y + h), (0, 0, 255), 4)

# Write output images
name = 'allignSSIM' + image2.split('.')[0] + '_VS_' + image1.split('.')[0] + '.jpg'
cv2.imwrite(os.path.join(path_to_test, name),imageB)

As before the left image is when current_image is exactly the same as image_base only certain objects are missing, the right one is when the current_image is slightly cropped from sides. Results are quite impressive. It is much better than what I expected. Still one sees that if I combine the slight cropping from sides with a little rotation, I get:

Not only it is noisy, but also identifies wrongly many bounding boxes that are not correct.

QUESTION. I wonder if this is a right approach. Somehow it sounds obselete. Wouldn't Deep-learning based approaches be applicable for such a problem?

Happy Finding and Thanks for your contribution.

score 2 · Accepted Answer · answered Apr 24 '23 at 04:04

This is a typical scenario in multi-object tracking problem

The general idea behind a multi-object tracking algorithm is the following:

Find all objects from both images and crop them into patches(small images) using their bounding box;
Define a similarity measure between any two crops (usually by some feature extractor) and calculate their pair-wise similarity scores;
Optimize the total similarity score by finding the maximum matching between the two sets of patches.

Then with the matching result, you can tell for each object in the base image: if it's present in the query image, and if so, where it is.

Simple solution with code

0. Original images

import cv2
base = cv2.imread(PATH_TO_BASE_IMAGE)
query = cv2.imread(PATH_TO_QUERY_IMAGE)

1. Object detection by contour

Objects in this example is very easy to detect - solid colorful geometric shapes. findContour from OpenCV is a very primitive way of detecting objects and is good enough here.

def detect_objects_by_contour(raw):
    img = raw.max(axis=2)  # keep the most intense channel
    _, thresh = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)  # binarization
    contours, _ = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)  # find contour
    # calculate bounding boxes
    bboxes = []
    for points in contours:
        ctr = np.vstack(points)
        l, t, r, b = ctr[:, 1].min(), ctr[:, 0].min(), ctr[:, 1].max(), ctr[:, 0].max()
        bboxes.append((l, t, r, b))
    return bboxes

And here is the result visualization by each step:

2. Feature extraction from crops

With each object in their own bounding box, you want to find descriptive features that can capture their characteristics for later use in matching. The higher quality feature you can get, the easier the matching problem will be. Here I chose the free and fast feature extractor from OpenCV - ORB.

def extract_feature_from_bbox(img, bboxes):
    ORB = cv2.ORB_create(edgeThreshold=0, fastThreshold=10)
    crop_features = []
    for b in bboxes:
        # add small buffer for edge detection to work
        crop = img[b[0] - 1:b[2] + 2, b[1] - 1:b[3] + 2]
        crop_features.append(ORB.detectAndCompute(crop, None))
    return crop_features

Note that the thresholds for the ORB extractor is chosen to overfit the problem. In practice, you want to evaluate different choices to find the best one.

3. Matching

The simplest algorithm for bipartite matching is Hungarian algorithm. Here I used the implementation from scipy.

from scipy.spatial.distance import cdist
from scipy.optimize import linear_sum_assignment
def find_best_match(patch1, patch2):
    # distance between two crops is defined as
    # mean of pair-wise distance of their key points in feature space
    dist_matrix = np.array([[cdist(d1, d2).mean() for _, d1 in patch1] for _, d2 in patch2])
    match_idxes = linear_sum_assignment(dist_matrix)
    return match_idxes

and here is the matching result:

As you can see all but one of the objects are matched well between the images except that the teal square from base is matched with the purple triangle. This is likely due to the fact that ORB features are not good enough (or too good) to represent simple geometries like this. You can improve the result with other features such as simply the color of the object.

With the matching result, you can easily tell which objects are missed:

missing_idx = set(range(len(base_crops))) - set(match_idxes[1])
print(f"{len(missing_idx)} objects are missing in query image. Their positions are {[find_box_center(bboxes[i]) for i in missing_idx]}")
>>> 3 objects are missing in query image. Their positions are [(45, 281), (209, 112), (173, 44)]

Complete code with visualization can be found here.

filipmu · Answer 2 · 2024-05-11T12:45:48.423

Your concrete example shows objects that are all on the same background and non-overlapping. They are also unique in terms of shape and color. Assuming this is true for the actual problem, here is an option for using machine learning. If your actual problem requires location to differentiate, then this will not work. If that is the case, I would recommend changing your concrete example to reflect a more realistic example.

To use any sort of learning algorithm, you need to turn this into a learning problem. One option is to turn this into a classification problem. There may be better learning formulations to this problem, but for experimentation it would provide some direction to explore. This would require you to create a labelled training set which has labels for the objects in the image. This can be represented as a multi-label problem. For example one training image might be labelled: {'Dark Green Rectangle':1, 'Red Square':1, 'Light Green Triangle':2,...} and another that is missing the Red Square {'Dark Green Rectangle':1, 'Red Square':0, 'Light Green Triangle':2,...}

Then you can train a deep learning NN on the images. The NN inference would provide predictions for presence of each object.

For the labels you could generalize them a bit, for example 'object 1', 'object 2', 'object 3'... as long as the training data and future image data is consistently interpreted.

This might be a case where you can use a simulation to create realistic training data (as is done to train AI for self-driving vehicles) Since you have individual images of each of the objects you could automate the creation of training data (set of images and labels). For example, create images with a random selection of objects at various scales, rotations, and locations while not overlapping.