DES SBOX Output with Bitslice

Question

I am not understanding how to compute the output bits of a 6-to-4-SBOX with bitslice technique in DES. Matthew Kwan made a brief recap in his paper "Reducing the Gate Count of Bitslice DES" of Biham original paper. He wrote:

Basically, for each S-box, the technique is to take two of the input bits, expand them to all 16 possible functions of two variables, and use the remaining four S-box inputs to select from those 16 functions. However, the details are slightly more complicated

I believe that I understand how to expand 2 variables to 16 functions (from f0 till f15)... But how do I select now with my remaining 4 Input Bits all 4 Outputs?

The paper of Matthew Kwan can be found here: http://fgrieu.free.fr/Mattew%20Kwan%20-%20Reducing%20the%20Gate%20Count%20of%20Bitslice%20DES.pdf

fgrieu · Accepted Answer · 2022-02-22T12:32:53.687

Eli Biham's original algorithm to implement any 6 to 4 bit S-box as described in Matthew Kwan's paper is to

Single out two input bits, say $i_1$ and $i_2$
Build all $2^{(2^2)}=16$ single-bit functions of $i_1$ and $i_2$, say $f_0$ to $f_{15}$
Describe each of the four outputs of the S-box as which of these functions $f_j$ needs to go to that output for each of the $2^4=16$ combinations of the four other input bits $i_3$ $i_4$ $i_5$ $i_6$ of the S-box, and implement that by using four layers of digital multiplexing for each output:
- For each of the $2^3=8$ combinations of $i_4$ $i_5$ $i_6$, we select according to $i_3$ which $f_j$ is needed. E.g. if for a certain output and certain combination of $i_4$ $i_5$ $i_6$ we need to select $f_4$ when $i_3=0$ and $f_7$ when $i_3=1$, then we can do this as $(f_4\operatorname{NAND}\bar{i_3})\operatorname{NAND}(f_7\operatorname{NAND}i_3)$, costing $3$ gates (discounting cost of inverting $i_3$). Thus this stage will cost $4\times8\times3=96$ gates total (but see optimization 1 below).
- For each of the $2^2=4$ combinations of $i_5$ $i_6$, we select according to $i_4$ which of two functions of the earlier stage is needed.
- For each of the $2$ values of $i_6$, we select according to $i_5$ which of two functions of the earlier stage is needed.
- We select according to $i_6$ which of the two functions of the earlier stage is needed.

The above does the multiplexing with $4\times(8+4+2+1)\times3=180$ $\operatorname{NAND}$ gates (plus $4$ inverters for $i_3$ $i_4$ $i_5$ $i_6$ if these needs to be accounted for).

Many optimizations are possible, including:

Using $\operatorname{XOR}$ which allows a multiplexing with two gates/instructions instead of three, e.g. we compute $((f_4\operatorname{XOR}f_7)\operatorname{AND}i_3)\operatorname{XOR} f_4$, noting that $f_4\operatorname{XOR}f_7$ comes for free since this is still a function of $i_1$ and $i_2$, thus an $f_j$, likely $f_3$ for some natural numbering; same for later multiplexing stages, by adjusting what earlier stages compute. This optimization is very effective in software. It's in Biham's implementation and in Kwan's account.
Computing $8$ rather than $16$ functions $f_j$, by adjusting polarity in the multiplexing.
On some occasions, reusing a function (beyond the $f_j$) across multiple S-box outputs.
On some occasions, not needing all functions $f_j$, because one happens not being used.
On some occasions, being able to remove a multiplexing stage, because the multiplexing input has no influence on the desired output.
On some occasions, being able to simplify a multiplexer because one of it's data input is constant.
Reordering things that can (the inputs $i_j$, the data inputs of multiplexers, the order of multiplexing bits $i_3$ $i_4$ $i_5$ $i_6$ for each output) to maximize occurrences of 3/4/5/6.

DES SBOX Output with Bitslice

1 Answers1