Apart from ELBOW rule and silhoutte coefficient is there any other better methods to pick correct number of clusters in recent years ?
1 Answers
There is a metric named Davies-Bouldin Index
Just like silhouette score, It can be used when cluster labels are not known.
From Scikit-learn documentation:
This index signifies the average similarity between clusters, where the similarity is a measure that compares the distance between clusters with the size of the clusters themselves.
Zero is the lowest possible score. Values closer to zero indicate a better partition.
Advantages: The computation of Davies-Bouldin is simpler than that of Silhouette scores. The index is computed only quantities and features inherent to the dataset.
Drawbacks: The Davies-Boulding index is generally higher for convex clusters than other concepts of clusters, such as density based clusters like those obtained from DBSCAN. The usage of centroid distance limits the distance metric to Euclidean space.
In order to find a suitable number of clusters, you can vary the number of clusters and calculate the index.
- 3,199
- 2
- 10
- 26