I am presently implementing the serpent block cipher in C++ following the specifications. It's important to mention that I'm implementing the cipher in bitslice mode. You'll need the The full submission package of Serpent which contains the specification of the algorithm and the source code in C and Java.
Lets start with the key schedule algorithm at pages 6 and 7 of the paper. You'll see ${k_0 , k_1 , k_2 , k_3} = S3(w_0 , w_1 , w_2 , w_3)$ and the following equalities for k. $S_i$ is a S-Box for $i = 0,\ldots, 7$. Notice that those S-Boxes take 4 32-bit integers as input parameters and return 4 32-bit integers. This confuses me, because on page 3, they write that those S-Boxes take a 4-bit integer as input parameter and return a 4-bit integer.
If someone can help me to understand those S-Boxes, it will be very appreciated. I need to know how to build them by using the ones mentioned at page 21 (A.5).
Also, in the serpentsboxes.h file given in the package, I saw the following code :
/* S0: 3 8 15 1 10 6 5 11 14 13 4 2 7 0 9 12 */
/* depth = 5,7,4,2, Total gates=18 */
#define RND00(a,b,c,d,w,x,y,z) \
{ register unsigned long t02, t03, t05, t06, t07, t08, t09, t11, t12, t13, t14, t15, t17, t01;\
t01 = b ^ c ; \
t02 = a | d ; \
t03 = a ^ b ; \
z = t02 ^ t01; \
t05 = c | z ; \
t06 = a ^ d ; \
t07 = b | c ; \
t08 = d & t05; \
t09 = t03 & t07; \
y = t09 ^ t08; \
t11 = t09 & y ; \
t12 = c ^ d ; \
t13 = t07 ^ t11; \
t14 = b & t06; \
t15 = t06 ^ t13; \
w = ~ t15; \
t17 = w ^ t14; \
x = t12 ^ t17; }
Here, w,x,y,z are the output and a,b,c,d are the input integers. If I well understood, the RND00 function is equivalent to $S_0$. If it's true, how did they get that code working ?
Define 16 functions like this make the code longer and not really understandable. Is there another way to code those functions with clearer instructions more like what's explained in the paper ?
Another question : What's the difference between bitslice mode and non bitslice mode in terms of performance and utility ? Why should one will prefer one instead of the other one ?
As you can see, I really don't want to copy the code. My objectives are to understand every step of the cipher and write my own readable code (optimized in C++11) in which it'll be easy to understand and follow.