Understanding Intel's algorithm for reducing a polynomial modulo an irreducible polynomial

Question

I'm reading this Intel white paper on carry-less multiplication. It describes multiplication of polynomials in $\text{GF}(2^n)$. On a high level, this is performed in two steps: (1) multiplication of polynomials over $\text{GF}(2)$, and (2) reducing the result modulo an irreducible polynomial. We use the "standard" bitstring representation of polynomials, i.e. $x^3+x+1 = [1011]$.

The paper gives an algorithm for calculation of the remainder polynomial on page 16 in Algorithm 3. However, I'm having trouble understanding the reduction algorithm on pages 16-17 (Algorithm 4). Essentially, I think we need Algorithm 4 for larger fields when our or partial results don't fit 128 bits anymore. They give an example for multiplication of two polynomials in $\text{GF}(2^{128})$.

Where do the "magic constants" 63, 62, and 57 for right shifts, and the "magic constants" 1, 2, and 7 for left shifts come from?

For example, how does one generalize the algorithm for smaller fields, say $\text{GF}(2^{32})$? Would the corresponding shift values then be 15, 14, 9 and 1, 2, 7?

In the final step 4, the algorithm tells you to "XOR $[E_1:E_0]$, $[F_1:F_0]$, and $[G_1:G_0]$ with each other and $[X_3:D]$".

Why do we do this? As far as I can see, the result of this XOR operation is neither stored anywhere nor used anywhere. Is it somehow used for computing $[H_1 : H_0]$?

score 7 · Accepted Answer · answered Apr 09 '14 at 00:03

The Galois field $GF(2^{128})$ has many different "concrete" representations. One popular representation is using polynomials in $GF(2)[x]$ (i.e. with coefficients in $GF(2)$) modulo some irreducible polynomial of degree $128$, say $x^{128}+x^7+x^2+x^1+x^0$. The magic constants $7,2,1,0$ come from this particular irreducible polynomial. You don't see $0$ since there is no need to shift by $0$. Similarly, the magic constants $57,62,63,64$ are the complements of the constants above with respect to $64$. Again, you don't need to shift by $64$ since the registers are $64$ bits wide. If we had used some other irreducible polynomial, the constants would have been different.

Regarding step 4, $[H_1:H_0]$ results from XORing $[E_1:E_0],[F_1:F_0],[G_1:G_0],[X_3:D]$. This implements the operation of MODing by $x^{128} + x^7 + x^2 + x^1 + x^0$. The idea is that modulo this polynomial, $x^{128} = x^7+x^2+x^1+^0$; addition here is XOR. The different summands corresponds to these different monomials – see if you can find the correspondence.

score 4 · Answer 2 · answered Apr 10 '14 at 16:46

For completeness, I'll flesh out Yuval's answer a bit more through an example of multiplication of two polynomials $A$ and $B$ in $\text{GF}(2^{16})$. Let $$A = [0001|1100|1110] = x^8 + x^7 + x^6 + x^3 + x^2 + x,$$ and $$B = [0100|0101|0111] = x^{10} + x^6 + x^4 + x^2 + x + 1.$$ The multiplication of $A$ and $B$ over $\text{GF}(2)$ is then $$C = [0000|0000|0000|0111 | 0101|0010|0000|1010].$$

We then compute for $\text{GF}(2^{16})$ the polynomial $$q^+ = x^{32}+x^{21}+x^{19}+x^{18}+x^{16}+x^{10}+x^6+x^4+1.$$ We now want to multiply $q^+$ with the high bits of $C$, and keep the 32 highest bits of this computation for later processing. For this purpose, we use Algorithm 4 (Step 1-2). Let us denote $C$ by $[X_3:X_2:X_1:X_0]$, where each $X_i$ is 8 bits long. We then right shift $X_3$ with 28, 26, 22, 16, 14, 13, and finally with 11. So we have altogether 7 "temporary variables", and we XOR them with $X_2$. This results in $111_2$, which is what we wanted.

For the lower part of the product of the next step, notice that we get the left shifts from $g^*$, which is defined on page 16.

Understanding Intel's algorithm for reducing a polynomial modulo an irreducible polynomial

2 Answers2