0

I was going throught the text Introduction to Algorithm by Cormen et. al. where I came across the two alternative definitions of Universal Class of Hash Function.

The versions are as follows:

Definition 1: Let $\mathscr{H}$ be a finite collection of hash functions that map a given universe $U$ of keys into the range ${0,1,...,m-1}$. Such a collection is said to be universal if for each pair of distinct keys $k,l\in U$, the number of hash functions $h\in\mathscr{H}$ for which $h(k)= h(l)$ is at most $|\mathscr{H}|/m$.

Now in the above definition the authors are using "for each pair of distinct keys $k,l\in U$" to mean that probably we do not leave out any possible key pair for which collision occurs.

Definition 2: With a hash function randomly chosen from $\mathscr{H}$, the chance of a collision between distinct keys $k$ and $l$ is no more than the chance $1/m$ which is the chance of a collision if $h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$.

Through-out the rest of the book the author uses the second definition. I felt like finding the equivalence of the two definitions.


Definition 1 $\implies$ Definition 2


Figure 1

Definition $1$ says that for a pair of distinct keys says $k,l$ for the set $A=\{h \in \mathscr{H} : h(k)=h(l)\}$ ,

$$|A|\leqslant|\mathscr{H}|/m \tag 1$$

Now if we chose a hash function $h\in \mathscr{H}$ at random then for the said pair of distinct keys $k,l$ we have,

$$ Pr\{h(k)=h(l)\} = \frac{\text{no. of functions in A}}{\text{no. of functions in $\mathscr{H}$}}=\frac{|A|}{|\mathscr{H}|} \leqslant\frac{1}{m} \quad\text{using (1)}$$


I could not quite find a way to show that: $$\text{Definition 2 $\implies$ Definition 1}$$

Or the way I proved the first implication actually proves the equivalent...

Previously I made an attempt but it was wrong as it was pointed out and so that wrong portion has been edited out.

I had found two similar questions 1 and 2 but it did not quite seem to answer my query.

Abhishek Ghosh
  • 1,184
  • 9
  • 24

1 Answers1

0

Both definitions are equivalent, and I think you have almost specified the reason behind it.

The below equivalence work in both ways.

From Definition 2:

"the chance of a collision between distinct keys $k$ and $l$" $\le$ $1/m$

$\Leftrightarrow$

{note that the "chance" here is about the process of picking $h$ from the class $\mathscr{H}$}

(Number of functions in the class where $k$ and $l$ collide) $/$ (Total functions in the class) $\le$ $1/m$

$|A|/|\mathscr{H}| \le 1/m$

$\Leftrightarrow$

$|A| \le |\mathscr{H}|/m$

(which is the wording used in Definition 1)

Nitin Verma
  • 317
  • 1
  • 10