Understanding a CRC32 Implementation

Question

I'm currently trying to understand an implementation of CRC32 about which I have a question.

On this page at section 6, there is the following code:

public uint Compute_CRC32_Simple(byte[] bytes)
{
    const uint polynomial = 0x04C11DB7; /* divisor is 32bit */
    uint crc = 0; /* CRC value is 32bit */
foreach (byte b in bytes)
{
    crc ^= (uint)(b &lt;&lt; 24); /* move byte into MSB of 32bit CRC */

    for (int i = 0; i &lt; 8; i++)
    {
        if ((crc &amp; 0x80000000) != 0) /* test for MSB = bit 31 */
        {
            crc = (uint)((crc &lt;&lt; 1) ^ polynomial);
        }
        else
        {
            crc &lt;&lt;= 1;
        }
    }
}

return crc;

}

I'm particularly interested in understanding this line: crc ^= (uint)(b << 24); /* move byte into MSB of 32bit CRC */

What are the mathematics that make this line possible, both the shifting of the current byte (turned into an int) by 24 and the following XOR with the current crc? Unfortunately, the author doesn't go into detail regarding this. What I want to know is why dividing the current byte and then xoring the remainder with the next byte is the same as a manual division bit by bit

Mark Adler · Answer 1 · 2022-04-24T18:40:35.747

Perhaps what you are imagining is something like this at the top of the inner loop, where the bits of b are fed one-by-one into the high bit of the CRC:

        if (b & 0x80)
            crc ^= 0x80000000;
        b <<= 1;

That is equivalent to this:

        crc ^= (uint)(b & 0x80) << 24;
        b <<= 1;

Now you can see where the 24 comes from. Note that each time, both crc and b are shifted up by one. It only makes decisions based on the high bit of crc, and the order that you do exclusive-or's in doesn't matter, so you can simply exclusive-or all eight of the bits of b into the high byte of crc outside of the inner loop, to get the same effect:

    crc ^= (uint)b << 24;

score 1 · Answer 2 · answered Jan 26 '25 at 14:39

I realize this answer comes very late, but I figured I'd write it down for future reference.

What you see there is an optimization that is covered to some extent in this great paper: https://www.zlib.net/crc_v3.txt. Section ' 10. A Slightly Mangled Table-Driven Implementation'.

Mark Adler's answer is obviously correct but perhaps not immediately obvious unless you've been staring at the problem for a while :)

Exposition

As described in that paper, without this optimization you normally have to append w 0 bits (where w is the width of the CRC register) to the message in order to flush out, as it were, the last register-ful of message bits. For example, using an implementation without this optimization, that reads in a bit a time, and given a 2-byte message and a 2-byte register, after 16 left shifts, we have a situation where all bits in the register are message bits at different points in their journey through the register, but none of which have yet completed their journey. We need another w left shifts to make all of them travel through to the very end, off the register. Shifting in w 0 bits (which is the same as simply performing w left shifts) accomplishes this.

With the byte-by-byte optimization, we don't have this problem because the logic goes like this:

XOR in the next message byte
perform 8 left shifts (one for each bit in the byte) Therefore every bit is guaranteed to complete its journey through the register.

Equivalence with the bit-by-bit solution

I can't explain this using MOD2 polynomial arithmetic theory, but I'll explain why this works in terms of the logic operations.

I mentioned earlier 'the journey of a bit through the register' a number of times. That's relevant here. It's probably best to think of a bit as an information unit e.g. a data packet if you will that travels through the register as it's shifted in at the right end, shifted left repeatedly until it reaches the leftmost bit position in the register, then ultimately shifted off the register.

In the case of the bit-by-bit implementation, we shift into the register one message bit at a time. What happens when we shift in a message bit involves a few things:

all the bits in the register are shifted left
the rightmost bit position now has value 0
we copy the message bit into the newly-freed rightmost bit position.

Notice this assignment of the message bit to the rightmost register bit is actually the same as XOR-ing the message bit into the register bit (in this case), because the register bit is 0, and by the XOR identity property, a XOR 0 = a.

If we imagine this bit (let's call it b) as travelling from the rightmost to the leftmost position in the register, by the time it reaches the leftmost position and finally gets shifted off the register and is used to make the decision as to whether to XOR the poly into the register, this bit b will have undergone perhaps a number of XORs with corresponding bits in the poly, during its journey:

b xor c xor d, ..., xor n

But remember what I said earlier: when we shift a message bit (let's call it x) into the register, we actually XOR this bit into b (b being the rightmost bit in the register). Therefore the above can be rewritten as follows (notice the xor x):

b xor x xor c xor d, ..., xor n

And this is the value (i.e. the information content in that bit if you will) that will be used to make the decision when the bit completes its journey.

In the byte-by-byte implementation we have exactly the same situation, but the xor x is never added. Clearly making a decision based on

b xor x xor c xor d, ..., xor n

vs

b xor c xor d, ..., xor n

may yield different results. Fortunately, this is an easy situation to repair. By the XOR associative property

a xor b xor c xor d = c xor a xor b xor d

it doesn't matter what order you xor bits as long as you do xor them. Therefore we can just do:

b xor c xor d, ..., xor n xor x

and now we get the same bit value as we would've had if we had shifted in a message bit as in the bit-by-bit implementation. And this is what this optimized byte-by-byte implementation does: it applies the XOR at a later point, taking advantage of the XOR associativity.

Of course, we don't apply it to a single bit (that's the whole point of the optimization) but to a whole byte. The same principle applies of course: all 8 bits in the byte will be updated as described for the individual bit above.

Understanding a CRC32 Implementation

2 Answers2