How to attack universal hash function based on finite-field arithmetic?

Question

As per the Recursive n-gram hashing is pairwise independent, at best paper, I want to use the algorithm described in chapter 6 and 7 (page 7 - 10). The hash works as follows:

Define a random function $h_1$ that maps elements in a set $B$ to elements in a set $I$, where $|B| = 2^8$ and $|I| = 2^{32}$.

i.e. elements of $B$ are all single-byte values, and elements of $I$ are some 32-bit numbers:

h1 = array of 32-bit numbers with indices in <0; 255>
FillArrayWithRandomValues(h1);
Interpret $h_1$ values as polynomials in $GF(2)[x] / p(x)$, where $degree(p(x)) = n = 32$ and $p(x)$ is an irreducible polynomial (also randomly chosen just like $h_1$). So the degree of polynomials in $I$ is at most 31, and $GF(2)[x] / p(x)$ is a field.
The hash function $h$ is then defined as:
$$h(a_1, a_2, ..., a_n) = h_1(a_1) * x^{n-1} + h_1(a_2) * x^{n-2} + ... + h_1(a_n) * x^0$$ where $a_i$ is the $i$-th input byte (element in $B$) to hash. Basically we take a polynomial from $h1$ based on $a_i$, and multiply it by $x^{n-i}$.

All math is done in $GF(2)$. Since I am reducing everything using polynomial of degree 32 (33-bit number), the output of $h$ is a polynomial of degree at most 31 in $GF(2)$ (32-bit value), which has been proven to be uniformly distributed and pair-wise independent in the paper.

This type of hash function is also called universal hash function. Because of the pair-wise independence, $h$ is a strong universal hash function. Using this formula, there are 134 215 680 irreducible polynomials in $GF(2)$ of degree 32, so that's about 27 bits of entropy in addition to the 8 192 bits of entropy coming from $h_1$, so the hash family is pretty large.
I play the following game with an attacker. He can submit up to $2^{64}$ queries ($2^{64}$ strings $a_1, a_2, ..., a_n$ of his choice), and I will reply $true$ to each query if the last $k$ bits of the $h(a_1, a_2, ..., a_n)$ is all 0 (coefficients of the last $k$ terms of the result polynomial are 0), otherwise I reply $false$. The $h_1$ and $p(x)$ are randomly chosen by me and kept secret, but everything else is known. I haven't decided on the value of $k$ yet, but it will be an integer in $<12; 18>$.

Given the game described in $4.$:

What can an attacker learn about $h_1$ and $p(x)$?
After he makes all the queries, is he eg. able to predict whether I will answer $true$ or $false$ for some strings that he did not submit in step $4$?
What would change if I used $0 < n < 32$, such as $n = 16$ or $n = 4$?
What would change if $p(x)$ was revealed to the attacker before he makes any queries?
What would change if he could make up to $2^{128}$ queries?

First progress

So there is a simple way to find some information about $h_1$. Let's say we have 2 input strings:

$a_1$ $a_2$ ... $a_{31}$ $a'_{32}$
$a_1$ $a_2$ ... $a_{31}$ $a''_{32}$

Ie. they only differ in the last byte. Then, assuming the last $k$ bits of the hash of both strings is 0, then we know that last $k$ bits of $h1(a'_{32})$ and $h1(a''_{32})$ are the same (even though we don't know what those bits are). It's easy to see this when the hash function is written in code (for input strings of length 2):

var hash = h1[a1];

// Galois multiplication by x and subsequent reduction
hash = (hash << 1) ^ ((hash >> 31) * irreduciblePoly);

hash = hash ^ h1[a2];

Using this method, I can assign members of $h1$ to groups based on equality of the last $k$ bits (so there will be $2^k$ groups and I will know exactly which members (indices of members in $h1$, to be precise) belong to which group).

More progress

Let there be 2 input strings:

$a_1$, $a_2$, ... $a'_i$ ... $a_{32}$
$a_1$, $a_2$, ... $a''_i$ ... $a_{32}$

The strings only differ in the $i$-th byte. If we find out that the last $k$ bits of hashes of both strings are equal to $0$, then we know that last $k$ bits of $h_1(a'_i) * x^{n-i}\ \textrm{mod}\ p(x)$ and $h_1(a''_i) * x^{n-i}\ \textrm{mod}\ p(x)$ must be the same (based on the $(a + I) + (b + I) = (a + b) + I$ rule).

Then, if we have another set of strings:

$b_1$, $b_2$, ... $b_{i-1}$, $a'_i$, $b_{i + 1}$, ... $b_{32}$
$b_1$, $b_2$, ... $b_{i-1}$, $a''_i$, $b_{i + 1}$, ... $b_{32}$

where $b_t$ is an arbitrary byte, then a hash of these two strings will always lead to the same answer (which is that both strings give $true$ - last $k$ bits of the hash is $0$, or both strings give $false$).

References

Quick intro to finite fields
Binary carryless multiplication
Universal hash functions based on finite field arithmetics (code)
Bleichenbacher's Attack on PKCS 1 might provide some ideas on how to attack this hash function, as it shows a technique to decrypt RSA PKCS 1 encrypted messages given an oracle that tests whether the 16 most signicant bits of the decryption of $r * C\ \textrm{mod}\ N$ are equal to $2$, for any $r$ of attacker's choice (from RSA-survey.pdf p. 14). Obviously these problems are only loosely related but may have some useful information nevertheless.

DISCLAIMER: I am not sure whether this question belongs to Math, Security, Cryptography or Stackoverflow forums, but I think mathematicians are most capable to provide an answer. I am a programmer so my question is probably not using the "standard" math terminology, so feel free to edit my question to clarify for others.

Given that $h_1$ takes values in $GF(32) \simeq GF(2)[x] / (p)$, isn't the output of $h$ in $GF(32)[x]$, (instead of $GF(2)$ as you write)? This would mean that it would have $1024$ bits ($32$ coefficients times $32$ bits each). — Alex M., Jul 06 '15 at 08:04
@AlexM. There is no $GF(32)$. All polynomial term coefficients are either $0$ or $1$ for all the math in this question. $h_1$ is a function that maps each byte (256 possible inputs) to some 32-bit number (chosen randomly at inception). Each of those 32-bit numbers (there are 256 of them, not necessarily distinct) is a polynomial in $GF(2)$ (only $2$ coefficients for each term) of variable degree (at most degree 31). The output of $h$ is also just a 32-bit number, because whatever the value of the expression in $3)$ turns out to be, it will be reduced using $p(x)$ of degree 32. — Paya, Jul 06 '15 at 17:23
Just to make sure that we understand each other: you are aware of the fact that $GF(32)$ is just a notation for $GF(2)[x] / (p)$, aren't you? — Alex M., Jul 06 '15 at 17:53
@AlexM. Maybe you mean $GF(2^{32})$ instead of just $GF(32)$? — Paya, Jul 06 '15 at 18:54
Oh yes, where was my mind? I meant $GF(2,32) = GF(2^{32})$, thank you for noting that. Anyway, this correction does not alter the essence of my question. — Alex M., Jul 06 '15 at 19:00
I am sorry if my question is confusing, for reasons stated in disclaimer. Maybe the confusion comes from the expression in $3)$. Even though it looks like it, it doesn't actually define a polynomial where each coefficient is taken from $h_1$. Instead, for each $a_i$, it takes a polynomial from $h_1$ for that $a_i$, and multiplies that polynomial by $x^{n-i}$ (thanks to $GF(2)$, this is effectively a left bitwise-shift). All these values are then added together to form an intermediary polynomial (the sum is done using XOR), and subsequently reduced using $p(x)$, which is the output of the $h$. — Paya, Jul 06 '15 at 19:07

How to attack universal hash function based on finite-field arithmetic?

0 Answers0