How to implement arbitrary s-box in a side-channel-free way in C?

Question

This could be a question for CodeReview.SE, but I thought it might require non-trivial cryptographic knowledge to merit it on-topic here.

The C language is chosen as it's a common language for implementing cryptographic algorithms. Also, as we're choosing C, the primary platforms under consideration are PCs, smart devices such as cellphones, tablets, and TVs, and servers.

Arbitrary s-box may be required when designing products for sale in jurisdictions that mandates local cryptography standards such as SM4, Camellia, SEED in China, Japan, and South Korea.

Here's my attempt at reducing side-channel when implementing arbitrary s-box. To the best of my knowledge, it's now constant-time, but

Q: how should other side-channel attacks such as fault attack and electromagnetic detector in proximity etc. be prevented?

#include <stdint.h>
const extern uint8_t sbox_table[256];
uint8_t sbox(uint8_t x)
{
    int i;
    uint8_t ret = 0;
    uint16_t mask = 0;
for(i=0; i&lt;256; i++)
{
    mask = i ^ x;
    mask = (mask - 1) &gt;&gt; 8;
    ret |= sbox_table[i] &amp; mask;
}

return ret;

}
uint8_t invsbox(uint8_t x)
{
    int i;
    uint8_t ret = 0;
    uint16_t mask = 0;
for(i=0; i&lt;256; i++)
{
    mask = sbox_table[i] ^ x;
    mask = (mask - 1) &gt;&gt; 8;
    ret |= i &amp; mask;
}

return ret;

}

score 3 · Answer 1 · answered Dec 01 '22 at 20:52

This question has already been answered in the comments.

First of all, from my experience there isn't such thing as creating a side-channel attack free implementation because this is the concept of side channel attacks, e.g. using unordinary information that are generated by an implementation that was previously considered useless.

Jumping to your code, which is at least secure against cache timing attacks, the comment from SAI Peregrinus holds :

Note that C does not guarantee the above code will be constant time: execution time is not considered an observable effect by the C standard, so NO code can be guaranteed to be constant time if written in C. It almost certainly is in practice, but compilers don't make any attempt to enforce this (or any other side-channel resistance).

As mentioned by poncho, in your code the memory accesses are independent from the value to look up, x. This doesn't guarantee constant time execution since the sbox_table will be stored in memory and cache misses will also probably take place. The BitSlicing technique used in BearSSL mentioned by kelalaka offers constant time execution. This technique is just a software implementation of a Boolean circuit that calculates the sbox. Even in its plain form of BitSlicing used in BearSSL I suspect it will be faster than your technique, check this post to see why. Also if implemented appropriately using xmm registers it may also multiple simultaneous sbox lookups and offer extra speedup according to this paper. In terms of security, both techniques offer no information about the x value with respect to the time of execution.

Now, speaking more generally about side channel attacks. This short paper, in my opinion provides a brief overview of common side channel attacks and their mitigations. There are many more side channel attacks than fault and electromagnetic attack where I don't think all of them can be mitigated using only software.

A few last words referring to your question, I don't think one can claim that your implementation is fully secure against EM based solely on the software since it is really depends on if hardware implements any optimization techniques that will reveal information or if it implements any mitigation techniques. Now, for the fault attack, I don't see any blatant flaws (e.g a branch depending on a single bit or something). But again fault attacks cover a really broad range of attacks and I think it is impossible to say that an implementation mitigates a large part of them.

score 2 · Answer 2 · answered Dec 01 '22 at 17:28

I've seen that a C-compiler for an 8-bit platform translated x++ for an uint16_t x to (1) incrementing the low byte of x, (2) branch skipping the next command if the result of incrementing is non-zero, and then (3) incrementing the high byte of x (unless we branched).

If your "(mask - 1)" is compiled in the same glorious way by that C-compiler, your loops will be in total constant time, but the round where x=i will take longer than the other rounds on that platform.

So the way to go is bit-slicing, as already suggested by kelalaka in the comments.

DannyNiu · Answer 3 · 2024-08-08T09:38:19.847

This is quite a while ago. I did a benchmark comparing the performance of:

substituting 4 bytes individually, and
substituting 4 bytes of a 32-bit word simultaneously.

This word-based technique outperformed the byte-based one, and supports arbitrary sbox (and slightly more). I'll present the word-based source code below.

#define BYTE2WORD(x) ((x) | ((x)<<8) | ((x)<<16) | ((x)<<24))
uint32_t wsbox(uint32_t w, uint8_t const wsbox_table[256])
{
    uint32_t i;
    uint32_t ret = 0;
    uint32_t mask = 0;
for(i=0; i&lt;256; i++)
{
    // initialize bit pattern for mask.
    mask = BYTE2WORD(i) ^ w;

    // contract 8 bits into 1.
    mask |= mask &gt;&gt; 4;
    mask |= mask &gt;&gt; 2;
    mask |= mask &gt;&gt; 1;
    mask &amp;= 0x01010101;

    // finalize mask
    mask *= 255;
    mask = ~mask;

    // add in masked substitute byte.
    ret |= BYTE2WORD((uint32_t)wsbox_table[i]) &amp; mask;
}

return ret;

}

Of course, when memory isn't scarce, the entries in wsbox_table can be 32-bit words to save time spent on BYTE2WORD.

In my actual implementation, wsbox_table is an array of pairs, the left-hand elements and the right-hand elements of the pair each form a permutation. It's designed in such way so that arbitrary computation on the input and output of SBox can be supported (as may be needed in algorithms such as Camellia), For details, see: https://github.com/dannyniu/MySuiteA/blob/4a3bffbd5cca477ab8456d5b8fe6aee41501ab9c/src/0-datum/sbox.c.h#L269

How to implement arbitrary s-box in a side-channel-free way in C?

3 Answers3

Linked