How secure and convincing as authorship proof is multiplying followed by adding (plaintext×key₁ + key₂)?

Question

Each text is a place-value representation of a number with regard to (w.r.t.) each alphabet that holds all characters in the text, in which case the number of characters in the alphabet serves as base. For instance, “479” is a base-10 place-value representation of 479 w.r.t. the alphabet “0123456789”, “FF” represents 255 w.r.t. “0123456789ABCDEF” (base 16), and the sentence

“Ƕat ɂa ræd broƿn fox to quickly jump high ɂover ðe lazy dog wið streŋþ!”

stands for the number

217383707677406363454372804270595121076147642667278359838633562911513164830118706430561478794198763199747839024329035616578013909254081431782160421

w.r.t. the alphabet

“ !"#$%&'()*+,-./0123456789:;<=>?@ɁFUÞORCGǷHNIJÏPZSTBEMLŊŒDAÆYÐVKQXǶW[]^_`ɂfuþorcgƿhnijïpzstbemlŋœdaæyðvkqxƕw{|}~¡×÷”

(base 116). For each alphabet, there’s a one-to-one correspondence between the natural (i.e. not-negative whole) numbers on the one hand and on the other hand the finite and in the alphabet written texts which don’t begin with the 0-representing character. Hence, we can multiply texts and add them (w.r.t. the alphabet), e.g. so as to encrypt them: Take the plaintext times with a key and add the product to a second key to get the encrypted text. Take the second key away from the ciphertext and divide the difference by the first key to get the plaintext back.

We have to make sure key₁ is longer than any string of zeroes in the plaintext; otherwise, the ciphertext before a tail about as long as key₂ would be made up of the products of sub-texts of the plaintext with key₁ separated by zeroes. An attacker could then guess this would be the case and thus find key₁ and so most of the plaintext by seeking the greatest shared divisor of the sub-texts of the ciphertext. Furthermore, zeroes at the tail of key₁ are useless. Because of the GCD issue, we also have to wield a unique key₁ and a unique key for each text we want to encipher. In addition, we should never encrypt the same text twice if we use the empty text as key₂, as this would again allow the attacker to exploit GCD, this time to find the plaintext directly.

fgrieu has pointed out and demonstrated that many random integers can be factored fast. This suggests to me that with no carefully chosen key₁ and key₂ = “”, an attacker could easily find the plaintext even when we've taken the abovementioned precautions. To meet this problem, we'd have to wield a key₁ with such big prime factors that it would be unwieldy.

Easier in my view is to wield a key₂ which doesn't correspond to 0. At first sight, it seems to me that if we pick a key₂ with, say, 100 symbols and which thus corresponds to a number on the order of a googol, the attacker would have to factorize on the order of a googol possible product texts, which is practically impossible. The multiplication step also seems to make frequency analysis useless. Hence, it seems to me that picking keys of ca. 100 characters or more and taking the above precautions yields a very secure way of encryption.

Question 1: Is this correct?

Here are two scenarios:

Alice has solved the Collatz Conjecture. She wants to publish her proof to prove her achievement. At the same time, she wants to not rob others of the chance to find a solution by themselves. Therefore, she encrypts her proof in the aforesaid way and publishes only the ciphertext while keeping the keys secret. She publishes the keys only fifty years later, whereupon people take key₂ away from her ciphertext and divide the difference by key₁ to get her plaintext proof. Forty years after Alice publishes her encrypted proof, Bob claims to have found one of his own. Can we trust his claim, or is there a reasonable chance that he might’ve deciphered Alice’s proof?
Alice has thought up an invention. To prove her authorship, she encrypts a description of her invention with the aforementioned method and publishes the resulting ciphertext. Then, she applies for a patent. In order for her to get a patent, she must not have publicly disclosed her invention before, so there must be no reasonable chance of anyone deciphering her ciphertext. Would her application be granted?

Question 2: Which of the following factors or combinations thereof, if any, make Alice’s claim of authorship convincing?

(f) that the keys contain her name and claim of authorship
(u) that the keys are much shorter than her proof respectively invention description

Bear in mind that Alice has no independent proof of when she made the keys or of the fact that they belong to her other than (f). The plaintext also contains her name and claim to authorship.

I think (f) alone is enough. I also think, but am less sure, that (u) alone suffices, considering how unlikely it is that another pair of keys much shorter than the plaintext would yield a second meaningful plaintext from the ciphertext. Is this correct?

(Note: I’m aware that the numbers involved are huge, but computers can efficiently multiply place-value representations. Indeed, I’ve written a program that can multiply a novel of over a million symbols by a paragraph of over a thousand characters within a minute and a third and divide the resulting ciphertext by the paragraph within six minutes at less than 0.8 GHz CPU clock speed and with just ca. 10 to 11 % CPU use.)

fgrieu · Answer 1 · 2025-02-25T18:40:28.523

A serious issue with the systems in the question is that for a sizable fraction of integers of a given size up to many thousands digits, it's feasible to obtain their factorization; and for a larger fraction, it's feasible to find a partial factorization with all except a few large factors.

Thus in the original system where it's published plaintext × key₁, it's at least to fear that a factorization including all factors of key₁ could be pulled from what's published.

A safeguard against that is to make key₁ a large secret prime (say, 500 decimal digits), and somewhat insure that plaintext has a secret factor of at least that size (I discuss below how costly that is). If plaintext was random, breaking it would be an instance of a (conjectured) hard factorization, since we need to pull two factors each at least 500 digits.

But it does not solve a wider problem: knowing enough of the plaintext (like, most of it's left or right) allows factorization and recovery of plaintext and key₁. Adding a random key₂ of e.g. 100 digits does not fix that (and as pointed in comment we must restrict the size of key₂). In order to conclusively fix this, we'd need a process to make the plaintext random-like. Which brings us far from the question, which aims to transform the plaintext using only simple arithmetic.

In order to insure that plaintext with imposed meaningful content on it's left has a random secret prime factor of at least 500 digits, the least inefficient method I can think of is to adjust the right of the plaintext to make the plaintext prime. It takes Wolfram Mathematica mere minutes to find primes 10⁵⁰⁰⁰+12123 and 10⁶⁰⁰⁰+9873, hours for 10¹⁰⁰⁰⁰+28579. Difficulty grows like n^3.46 where n is the number of digits, using a practical algorithm like Toom-Cook 3. Thus a least some hundred thousands digits is feasible, if not practical.

How secure and convincing as authorship proof is multiplying followed by adding (plaintext×key₁ + key₂)?

1 Answers1