How can I find a non-exhaustive algorithm for the following problem? In the $N$-base number system, there are exactly $ N^{(N+1)} $ numbers with $N+1$ digits. I would like to select $N^2$ numbers such that the Hamming distance between any two of them is the same. For example, for $N=3$, here are the $3^2=9$ numbers: $$0000, 0111, 0222, 1012, 1120, 1201, 2021, 2102, 2210$$ all with a Hamming distance of 3 between any two pairs. I am looking for a feasible algorithm that finds $N^2$ numbers with the same Hamming distance, if they exist. The value of $N$ is sufficient if it is maximum $10$. In this case, the distance can be a predefined value, it is not necessary for the program to vary it. As background information: I started with a geometry problem in N-dimensional Euclidean space and ended up with this combinatorial question.
2 Answers
One possible approach is to use a SAT solver.
I can suggest one encoding of this problem as a SAT instance. Introduce boolean variables $x_{i,j,d}$, which is intended to be true if the $j$th position of the $i$th number has the digit $d$, for $i \in [N^2], j \in [N+1], d \in [N]$. Also, introduce boolean variables $y_{i,i',j}$, which is intended to be true if the $i$th and $i'$th numbers differ in position $j$. Then add constraints that encode the requirements for an assignment to count as a solution to your problem:
Hamming distance of $D$: For each $i\ne i'$, we require that exactly $D$ out of $y_{i,i',1},\dots,y_{i,i',N}$ be true. See Reduce hitting set to SAT, and cardinality constraints, Encoding 1-out-of-n constraint for SAT solvers, https://stackoverflow.com/q/43081929/781723 for how to encode a $D$-out-of-$N$ constraint in SAT.
$x$ and $y$ are consistent with each other: For each $d$, we have $(x_{i,j,d} \land x_{i',j,d}) \implies \neg y_{i,i',j}$. Also, for each $d \ne d'$, we have $(x_{i,j,d} \land x_{i',j,d'}) \implies y_{i,i',j}$. Each can be rewritten as a disjunction of three literals (i.e., a CNF clause), so for each $i,i',j$, we obtain $N^2$ CNF clauses, and we can add all clauses obtained in this way to the SAT instance.
In this way, we can obtain a SAT instance with $N^5 + O(N^4)$ variables and $O(N^7)$ clauses.
You can optionally add a symmetry-breaking constraint, that the $N^2$ numbers appear in lexicographically sorted order. This will require a small number of additional variables and clauses, but might speed up the solver because it doesn't need to explore as many equivalent solutions.
You can then apply an off-the-shelf solver (e.g., Z3). I'm guessing existing solvers won't be good enough to solve this for all $N \le 10$, but you could give it a try and see how they do and how large a value of $N$ they can handle.
For specific values of $D,N$, it might be possible to find better encodings as SAT.
- 167,959
- 22
- 232
- 500
I implemented D.W.'s encoding for SAT. It seems that setting the Hamming distance to $N$ is the best in practice.
In case you're interested:
For $N = 4$, here's a solution: ['00333', '01210', '02122', '03001', '11321', '10202', '12013', '13130', '22231', '20020', '21103', '23312', '33223', '31032', '32300', '30111']
For $N = 5$, here's a solution: ['011133', '022420', '003004', '040242', '034311', '143121', '112012', '131200', '104443', '120334', '230023', '202231', '224102', '213340', '241414', '321041', '333432', '314224', '300110', '342303', '432144', '444030', '401322', '410401', '423213'].
For $N \geq 6$ I haven't been able to obtain solutions. The case $N = 6$ might be possible with the same encoding and spending more time on it (the previous solutions you can find in a second). Beyond that, I feel like a more efficient encoding will be necessary. If I manage to solve more, I will update this answer.
Do you have a proof that these exist for every $N$?
Update: I think this is always possible for $N$ prime, and easy to construct!
Here's a solution for $N = 7$: ['00000000', '01111111', '02222222', '03333333', '04444444', '05555555', '06666666', '11234560', '12345601', '13456012', '14560123', '15601234', '16012345', '10123456', '22461350', '23502461', '24613502', '25024613', '26135024', '20246135', '21350246', '33625140', '34036251', '35140362', '36251403', '30362514', '31403625', '32514036', '44152630', '45263041', '46304152', '40415263', '41526304', '42630415', '43041526', '55316420', '56420531', '50531642', '51642053', '52053164', '53164205', '54205316', '66543210', '60654321', '61065432', '62106543', '63210654', '64321065', '65432106'].
Notice the pattern? The algorithm is simple: for each $a, b \in \mathbb{F}_N$, construct a string $S(a, b)$ as follows:
- The first position $(i = 0)$ of $S(a, b)$ is always $a$.
- The position $1 \leq i \leq N$ is $(a\cdot i + b) \! \! \! \mod N$.
I claim that these $N^2$ strings all have Hamming distance $N$ to each other. There are two cases:
- if they use the same $a$, namely $S(a, b)$ and $S(a, c)$ with $b \neq c$, then they match on the first position but all other coordinates are different; indeed, if they shared the $i$-th coordinate, we would have $(a \cdot i + b) \equiv_N (a \cdot i + c)$, and thus $b \equiv_N c$, which is impossible since $b \neq c$.
- if they use a different $a$, then they differ on the first position, and we can see that they only match on exactly one other position. Indeed, the equation is now $(a_1 \cdot i + b) \equiv_N (a_2 \cdot i + c)$, and equivalently, $i(a_1 - a_2) \equiv_N (c-b)$. Because $\mathbb{F}_N$ is a field, this equation has exactly one solution for $i$; $(c-b) \cdot (a_1 - a_2)^{-1}$.
- 1,505
- 7
- 14