Actual RSA encryption & decryption of strings?

Question

$K_{pub} = (n, e)$

$K_{pvt} = d$

Then

$E_{K_{pub}}(x) \equiv x^e \mod n$

Practically, when RSA is used to encrypt strings, what is the $x$ here? You cannot take it byte by byte because $\mod n$ will result in values larger than a byte. So what is done?

fgrieu · Answer 1 · 2023-09-24T19:21:09.897

Practically, when RSA is used to encrypt strings, what is the $x$ in $x^e\bmod n$?

That depends on the variant of RSA. Among the most common:

Toy-sized textbook RSA, where the public modulus $n$ is small: it is customary to encrypt letter by letter (or pair of letters, as in the original RSA article's small example) and concatenate the RSA cryptograms. Thus $x$ is the rank of the letters in the encoding used (or $x=x_0\,b+x_1$ where $x_0$ and $x_1$ are the ranks of two letters, with $b$ a public constant greater than the maximum value of $x_i$, e.g. $b=100$ in said article). There is no security for small $n$: a toy hammer won't actually nail. Small $n$ is anything up to like a hundred decimal digits. That can be factored quickly, which allows decryption. See this for records.
Textbook RSA with large $n$: it is customary to transform the string into bytes (e.g. per UTF-8, the modern compatible superset of ASCII), then from bytestring to integer $x$ (usually per OS2IP). In Python
```
int.from_bytes(bytes('François wears a !', 'UTF-8'), byteorder='big', signed=False)
```
There is a size limitation to $k-1$ bytes, where $2^{8(k-1)}<n<2^{8k}$, which insures $0\le x<n$. On decryption, leading zero bytes are ignored/removed (due to the simplistic conversion from string to bytestring). Variations abound (some encoding of size to allow any bytestring, padding on the right so that $x$ is large even for small strings, endianness…).

Caution: Textbook RSA in not secure under Choosen Plaintext Attack:
- An attacker can trivially verify a guess of the plaintext: just encrypt the guess and check against the cryptogram. That attack is devastating for names on the class roll, credit card number…
- When short strings encode to small integers $x$, several other attacks apply, including
  - when $x$ happens to we writable as $x=x_a\cdot x_b$ for integers $x_a$ and $x_b$ small enough to be found by enumeration, there's a meet-in-the-middle attack
  - when $e<\log_2(N)/\log_2(x)$, it stands $x^e\bmod N\,=\,x^e$, and thus it's trivial to find $x$ by $e\text{th}$ root extraction.
RSAES-PKCS1-v1_5: similar to 2 plus random padding, and means to remove it on decryption. $x$ is a combination of the string to encode, 3 constant bytes, and at least 8 random (non-zero) bytes. The string is thus limited to $k-11$ bytes (per §7.2.1 step 1). This method is better, but still has serious defects:
- Implementations of decryption are difficult to protect against side-channel attacks. The first was Daniel Bleichenbacher's Chosen ciphertext attacks against protocols based on the RSA encryption standard PKCS #1, in proceedings of Crypto 1998, and there are many variations.
- Unless we lower the $k-11$ limit, encryption is inherently vulnerable to an attack under CPA costing $2^{63}$ encryptions.
For these reasons, RSAES-PKCS1-v1_5 should not be used in any new design.
RSAES-OAEP: this is a major improvement of the above, using a hash. The string is transformed by the padding process into integer $x$ that is sort of random with $0\le x<2^{8(k-1)}$, and that's undone in decryption. Secure implementations of decryption are easier than for RSAES-PKCS1-v1_5. Security is theoretically reducible to that of the hash and of the RSA problem (finding a random $x$ given $x^e\bmod n$). The size limitation becomes $k-2h-2$ bytes (per §7.1.1 step 1.b) where $h$ is the size of the hash (e.g. $h=32$ bytes for SHA-256).
Hybrid encryption, e.g. RSA-KEM. A random value $x$ with $0\le x<N$ is RSA-encrypted with no padding, a symmetric encryption key is derived from that, and that key is used to encrypt(-and-MAC) the string to encrypt. Some avenues of implementation mistakes on decryption that still exist in RSAES-OAEP are gone. Security is theoretically reducible to that of the encryption and the RSA problem, with a simpler proof and/or quantitatively better assurance than for RSAES-OAEP. There is no size limitation. However the size of the cryptogram is slightly increased, and we need a Key Derivation Function and an authenticated cipher, when that's built into RSAES-OAEP.

score 0 · Answer 2 · edited Oct 07 '21 at 07:34

Composed the answer I was looking for from the different comments in response to the question

Input is considered as an array of bytes/octets (8 bit).
k is the octet length of the RSA modulus (n)
Maximum number of octets which can be encrypted with RSA is k - 11
The array of octets after padding is considered to be a Big Integer - x
The Big Integer x is encrypted using the public key - $E_{K_{pub}}(x) \equiv x^e \mod n$

For more info, look at OS2IP and PKCS1

Actual RSA encryption & decryption of strings?

2 Answers2

Linked