1

I was reading hashing from CLRS. In it author says:

Let $\mathscr{H}$ be a finite collection of hash functions that map a given universe $U$ of keys into the range ${0,1,...,m-1}$. Such a collection is said to be universal if for each pair of distinct keys $k,l\in U$, the number of hash functions $h\in\mathscr{H}$ for which $h(k)= h(l)$ is at most $|\mathscr{H}|/m$.

So basically in universal set of hash functions, the number of hash functions a finite collection of hash functions is said to be universal if number of hash functions $h$ for which $h(k)=h(l)$ is at most $\frac{\text{number of hash functions}}{\text{size of hash table}}=\frac{|\mathscr{H}|}{m}$

However next the author says:

In other words, with a hash function randomly chosen from $\mathscr{H}$, the chance of a collision between distinct keys $k$ and $l$ is no more than the chance $1/m$ of a collision if $h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$.

I didnt get the last part "$h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$". How $|\mathscr{H}|=$[$h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$]

RajS
  • 1,737
  • 5
  • 28
  • 50

2 Answers2

1

Let's plug real numbers into the passage:

$|\mathscr{H}|$ = 10;

$m$ (size of hash table) = 100

In the first statement:

$\frac{|\mathscr{H}|}{m}$ = $\frac{10}{100}$ = 0.1 = the maximum number of hash functions that will cause a collision. (Assuming the count of functions in $\mathscr{H}$ can be represented as decimals for sake of argument.)

The important thing to note here is that the chance of selecting a collision causing hash function is: $\frac{0.1}{|\mathscr{H}|}$ = $\frac{0.1}{10}$ = 1%.

In the second statement:

The argument being made is that the chance of collision "is no more than..":

$\frac{1}{m}$ = $\frac{1}{100}$ = 1% = the chance of generating a hash value $h(key)$ from the set {0, 1..., 99}. AKA the chance of selecting a specific value from 100 values.

Conclusion:

So, from our example, a collection of 10 hash functions that map a given universe of keys into the range {0, 1..., 100 - 1} is said to be a universal collection if: by randomly choosing a hash function, the chance of collision between distinct keys is no more than 1%.

kolaworld
  • 111
  • 5
0

In the second paragraph:

"In other words, with a hash function randomly chosen from $\mathscr{H}$, the chance of a collision between distinct keys $k$ and $l$ is no more than the chance $1/m$ of a collision if $h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$."

Here, "the chance of a collision between distinct keys", is simply specified to be $\le 1/m$.

Further the author says that, this $1/m$ is also the chance (probability) as below:

If two values (here $h(k), h(l)$) are randomly/independently picked from 0 to $m-1$, what is the chance of they being same. This is actually:

(probability that both equal certain value $i$) * (all possible values of $i$)

= $((1/m)*(1/m)) * m$

= $1/m$

So the author is giving a meaning to the value $1/m$.

Nitin Verma
  • 317
  • 1
  • 10