7

I am trying to understand the split mask countermeasure which is a masking method to avoid side channel attacks. Let's first describe the principle and then try to apply it to AES.

I found this paper which analyses this method. First, I will quote the presentation of the split mask countermeasure which is given in the second section:

Let $S$ be an S-box with input $x$ and output $S(x)$ implemented as a lookup table. The split mask implementation of $S$ consists of a masked table $S′$ and a mask table $M$. These tables are defined as follows: \begin{equation} S'(x \oplus n) = S(x) \oplus r_x, \hspace{1cm}\\ M(x \oplus n) = r_x \oplus m \hspace{1cm}(1)\\ \end{equation}

This means that the input of the S-box is masked with $n$, and each output value is masked with an individual random value $r_x$. This gives the masked table $S'$. The set of output masks $r_x$ is also stored in the mask table $M$ so that \begin{equation} S'(x \oplus n) \oplus M(x \oplus n) = S(x) \oplus m \hspace{1cm}(2) \end{equation} holds for every input $x$. In other words, $m$ can be viewed as the output mask of $S$ that is split into two shares $r_x$ and $M(x \oplus n)$, the splitting being individual for each table entry.

The split mask countermeasure with a single mask table is claimed to thwart the 1-st order DPA attack. For this, the original description requires that $(2)$ should never be computed directly (i.e. appear as an intermediate value) during an algorithm execution.

The description seems clear to me except this sentence:

For this, the original description requires that $(2)$ should never be computed directly (i.e. appear as an intermediate value) during an algorithm execution.

So how should it be computed?

So let's consider we want to encrypt a block data using AES and the split mask countermeasure. We start by generating $n$, $m$ and 256 bytes $r_x$ to compute $S'$ and $M$.

But then, when we have to apply the $\texttt{SubBytes}$ operation during the algorithm, how to proceed?

In the same paper mentioned above it is said:

Other details of concrete implementations can be found in [10, 11, 12, 13]. In these papers the countermeasure is proposed for an optimized AES implementation with $8 \times 32$-bit lookup tables that are used to compute the S-box and the diffusion simultaneously.

but it concerns optimized $32$-bit implementation and I would like to simply implement it for a traditional $8$-bit version so it does not help me that much...

Raoul722
  • 3,003
  • 3
  • 23
  • 42

2 Answers2

3

Given some intermediate data $x$ as two shares $x=x_1\oplus x_2$ take some fresh random $r$ to calculate new shares $x_1' = ((x_1\oplus r)\oplus x_2)\oplus(n\oplus r)$ [parenthesis indicating the order of evaluation] and $x_2' = n$. Now you can use $x_1'$ ($=x\oplus n$) as input for both tables.

The answer to "So how should it be computed?" is not at all. It is not needed. I think this is just a warning that it would be a problem to have (2) as intermediate value, because $m$ is kept constant for many plaintexts. Strangely enough, the inventors of the split mask countermeasures didn't realize that keeping $n$ constant for many plaintexts poses the same risks. A very short look at the paper you linked to, gave me the impression that it will expose this weakness in later chapters.

whoever
  • 31
  • 1
2

I did not paid attention enough when reading the paper. The figure 2 illustrates the operation:

enter image description here

So after the computation of $S'$ and $M$, at the first round, the $\texttt{AddRoundKey}$ step stay the same but in addition, the round key is xored with $n$. So if the block data is $x$, after the first $\texttt{AddRoundKey}$ we get $x \oplus k \oplus N$ (where $N = n \space || \space n \space|| \space n \space ||\space ... \space||\space n$ to get a $128$-bit block). Then we perform the rest of the algorithm as usual:

  • The $\texttt{SubBytes}$ using $S'$ returns $S(x) \oplus r_x$
  • then the $\texttt{ShiftRows}$ operation is performed and as it is a linear operation we get $\texttt{SR}(S(x)) \oplus \texttt{SR}(r_x)$
  • the same remark is valid for the $\texttt{MixColumns}$ operation and we finally get $\tilde{S} = \texttt{MC(SR}(S(x))) \oplus \tilde{r_x}$ where $\tilde{r_x} = \texttt{MC(SR}(r_x))$.

In parallel, we have to perform the same operations on the mask $M$ and we get: $ \tilde{M} = \tilde{m} \oplus \tilde{r_x}$ where $\tilde{m} = \texttt{MC(SR}(m))$.

Then, the next $\texttt{AddRoundKey}$ is additionally xored with $\tilde{m} \oplus \tilde{M} \oplus N$ to 'rebalance' it for the next round: $\tilde{S} \oplus \tilde{m} \oplus \tilde{M} \oplus k \oplus N = \texttt{MC(SR}(S(x))) \oplus k \oplus N$

Etc...

Finally, the last $\texttt{AddRoundKey}$ (so in the last round), is not performed by xoring it with $n$ but still with the transformations undergone by $M$ and $r_x$ (note that it is not $\tilde{r_x}$ and $\tilde{M}$ as defined above because the operation $\texttt{MixColumns}$ is omitted in the last round).

As whoever pointed out in his answer:

  • there is no need to compute $(2)$, this is just a warning as a precaution
  • the mentioned paper exposes that using the same $n$ for many inputs introduces a weakness.
Raoul722
  • 3,003
  • 3
  • 23
  • 42