6

Let $H = \{h_r : U \rightarrow [m]\}$. What are the currently known most efficient algorithms such that $H$

  • is a universal family and
  • fulfils the homomorphic XOR operation property $\forall h \in H \forall x,y \in U: h(x \oplus y) = h(x) \oplus h(y)$?
Martin Kromm
  • 407
  • 2
  • 8

2 Answers2

6

I believe that the internal GHASH function from GCM would meet that criteria (if you trim off the length word, and require universality only with equal length inputs [1]); it can be defined as:

$$\operatorname{GHASH}_k( M_n, M_{n-1}, …, M_0 ) = \sum k^i M_i$$

With the input $M_n, M_{n-1}, ..., M_0$ being the input message divided into 128 bit blocks, $k$ being the universal hash key, and the arithmetic (both the additions and the multiplications) done over the field $\operatorname{GF}(2^{128})$

It meets the criteria:

  • It is universal (for equal length messages); for random $k$ and any two distinct equal length messages $M, M'$, we have $\operatorname{GHASH}_k(M) = \operatorname{GHASH}_k(M')$ with probability $\le |M| / 2^{128}$

  • It meets your homomophic requirement; this is because addition in $\operatorname{GF}(2^{128})$ is exclusive-or, and we have $k^i M_i \oplus k^i M'_i = k^i( M_i \oplus M'_i)$

  • It is quite efficient (especially with AES-NI instructions); I can't say that it's the most efficient possible...


[1]: You cannot get both the homomorphic properties and the universality (across messages of different lengths) to hold simultaneously. The homomorphic property requires that $h_k(0) = 0$ and that $h_k(00) = 0$, hence we have two different messages $0$ and $00$ which hash to the same value with high probability (actually, 1), thus $h_k$ is not a universal hash family.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
poncho
  • 154,064
  • 12
  • 239
  • 382
4

Any polynomial evaluation hash or polynomial division hash, without length padding, has the property you seek:

  • Polynomial evauation. If $H_r(m) = m(r)$ where $m$ is a polynomial of zero constant term and degree $\ell$ over some field and $r$ is an element of the field, then we have $$H_r(m) = m_1 r^\ell + m_2 r^{\ell-1} + \cdots + m_{\ell-1} r^2 + m_\ell r,$$ so clearly $H_r(m + m') = H_r(m) + H_r(m')$. Standard examples of this form are Poly1305 and GHASH. If the field has characteristic 2, as in GHASH, then $+$ is xor. This obviously generalizes to multivariate polynomials too, e.g. the dot product $H_{r_1,r_2}(m_1 \mathbin\| m_2) = m_1 r_1 + m_2 r_2$ (which naturally attains a lower collision probability).

  • Polynomial division. If $H_f(m) = (m \cdot x^n) \bmod f$ where $m, f \in \operatorname{GF}(p)[x]$, and where $f$ is irreducible and of degree $n$, then clearly

    \begin{align} H_f(m + m') &= \bigl[(m + m') \cdot x^n\bigr] \bmod f \\ &= (m \cdot x^n) \bmod f + (m' \cdot x^n) \bmod f \\ &= H_f(m) + H_f(m'). \end{align}

    Polynomial division hashes are related to CRCs and Rabin fingerprints. When $p = 2$, $+$ is xor.

Beware that multiplication in fields of characteristic 2 is generally not efficient in software, and that the most efficient software is riddled with timing side channels—unless you can fruitfully organize your computation to simultaneously compute a batch of (say) 64 instances of it in parallel using bitslicing.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230