5

I think I have understood the DBScan algorithm for 2D data points. We can consider the example in scikit-learn. They generate a set of data points:

from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler

centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs( n_samples=750, centers=centers, cluster_std=0.4, random_state=0 )

X = StandardScaler().fit_transform(X)

The X and Y data points are:

print(X)
[[ 0.49426097  1.45106697]
 [-1.42808099 -0.83706377]
 [ 0.33855918  1.03875871]
 ...
 [-0.05713876 -0.90926105]
 [-1.16939407  0.03959692]
 [ 0.26322951 -0.92649949]]

The parameters to be chosen for the DBScan algorithm are two: neighborhood radius ϵ and minimum number of samples. It works in this way: the algorithm starts from one random sample and calculates how many other samples fall within its neighborhood radius ϵ and generates a cluster if in the neighborhood radius there are at least the minimum number of samples.

So for the example above:

from sklearn.cluster import DBSCAN

db = DBSCAN(eps=0.3, min_samples=10).fit(X) labels = db.labels_

import numpy as np

labels = np.unique(labels)

labels Out[7]: array([-1, 0, 1, 2], dtype=int64)

It has found 3 clusters and some noise points.

Now, if I want to apply this in order to cluster groups in images, how can I do ?

I found this example here https://www.youtube.com/watch?v=wyk_vkL2os8. I tried to reproduce it using one image example that I found here in StackOverflow.

from sklearn.cluster import DBSCAN
from matplotlib.image import imread
import numpy as np
import matplotlib.pyplot as plt

#load image image = imread('pZKmf.png')#take the image and convert into pixels using imread

print(image.shape)#(217, 386, 4)

plt.imshow(image)

The original image:

enter image description here

#convert it into two dimensional
#Flatten the image to create a 2D array of pixels
# you rescale in 2D array in order to have (features,samples)
X = image.reshape(-1,4)

print(X.shape) #(83762, 4)

#Apply the DBSCAN algorithm

dbscan = DBSCAN(eps=0.01, min_samples=500)

labels = dbscan.fit(X)

print(labels.shape) #(83762,)

unique_labels = np.unique(labels) #it corresponds to the number of clusters #array([-1, 0, 1, 2], dtype=int64) #It has found 3 clusters

segmented_img = labels.reshape(image.shape[:2])

print(segmented_img.shape) #(217, 386)

plt.imshow(segmented_img)

Here it is the segmented image:

enter image description here

In this example, it considers pixel intensities as features and we have 4 images which are the 4 samples.

I can not figure out which are the samples to cluster... I would consider pixels as samples to cluster. How can I understand how does clustering work in this case ?

EDIT: Thanks for the answers. Neural Network used for image embedding is interesting and I will for sure try to use this idea. However, I was asking for something that is more basic... how does clustering an image work ? What are samples ? What are features ? If DBScan consider a radius for the size of cluster, in which space do we have to consider it ? Besides DBScan, I am also a bit confused about the meaning of clustering an image.

2 Answers2

6

One important step in clustering images is how the image is encoded. One useful way to encode an image is using vector embeddings, where the choosing numerical values have semantic meaning. The closer two images are in the vector embedding space the more meaning they share, thus the clustering captures the meaning of the images. There are many models available to create image embeddings, Hugging Face is one example.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113
1

[1]https://emad-ezzeldin4.medium.com/discovering-different-environments-in-animal-camera-traps-f157df07f9c8?source=friends_link&sk=9da82baa946c1d11cc8ceae304d3dc2c

[2] https://emad-ezzeldin4.medium.com/debugging-computer-vision-image-classification-why-is-your-model-failing-in-production-11976e5311f2?source=friends_link&sk=5c49481d65179689093b4b4fb9b8e231

https://miro.medium.com/v2/resize:fit:786/format:webp/1*vnaZDpBhmpZ6prAlsqb73w.png

[4] https://www.tensorflow.org/tutorials/images/classification

This [1] is an actual example of images clustered well using vector embeddings of Vgg16 image classifier as features ( as opposed to pixel intensity in your post). It was clustered with KNN though and not image segmentation per your example. DBSCAN is probably powering engines like LIME (local interpretable model-agnostic explanations) to find the highest density of high value features within the image [2]. To simplify , we can think of it as a convolution window scanning the image for important features used by the classifier. Same principles are applicable in image segmentation probably.

Edit Section

What are the samples in an image ?!

Every pixel in the image is a an observation , if an image resolution is 4 x 4 . Then it has 4*4 (8) pixels / observations . Thus converted into an 4x4 matrix each index in the matrix represents a pixel position. However the most representative way to put your total number of observations is (number of images * length *width ). Where full matrix size is ( number of images , length , width , RGBsize ).

What are the features in an image ?!

“ This is a batch of 32 images of shape 180x180x3 (the last dimension refers to color channels RGB).” [4] Any image can be converted to a numpy array matrix when loaded. The neural networks would not be used for machine learning exactly , it’s just that RGB is not a very Smart feature representation and a better way is to use a neural network to generate smart features as an input processing steps. The neural networks will take in that RGB bumpy array and produce a numpy array with much better feature representations. For example , if that neural network was initially trying to classify an animal, it will generate very high values on the numpy array output on the pixels representing the animals most dominant characteristics. Then the clustering algorithm can use it to put it with images of similar animals.

What is clustering in an image ?

Well simply Grouping similar images or objects together. So the output of the nerual network embedding is matrix say like this [1 1.5 4 9 . . . 7] where every element in the matrix indicates the shape of an object in the images. In cats and dogs problem for instance [dog_nose dog_tail cat_nostrils , cat_ears . . . dog_Eyes] each position holds a value for a complex shape presence in the image (regardless of X-Y location in the image). Hence a nose value of say 0.8 indicates a unique shape probably closer to a cat than dog in matrix index 0. A KNN or whatever clustering algorithm will just put similar matrices ( a matrix per image ) together. Also note that it is magnitude based so on the same matrix a value of 1 (normalized) in the matrix location of nose , means that in this image it is certain that there is a dog_nose (idenfied geometrical complex pattern) by one of the layers in the neural network.

Now each image becomes a feature , whereas each positional index is a complex shape identified (dog_nose , cat_tail) and the value indicates the confidence of presence of that complex shape in the image. Cat Image 1 vector : [cat_nose=1 , dog_tail= 0.2 , cat_ear = 0.9] And so on.

So if you had 100 images , you get 100 vectors like this.