Minhash: Difference to random sampling

Asked Sep 16 '24 at 21:21

Active Sep 16 '24 at 21:21

Viewed 11 times

Minhash is used in order to approximate the similarity of two sets, more specifically their Jaccard similarity.

In my understanding, the main benefit is to not have to explicitly calculate the intersection, but instead use minhash, because the probability of a match is exactly the Jaccard similarity.

But how is taking the minimum of a hash (with random properties) different from directly selecting an item at random? Couldn't the Jaccard similarity be approximated by taking random elements directly from both sets k times and checking the number of matches?

asked Sep 16 '24 at 21:21

Jannik

Minhash: Difference to random sampling

0 Answers0