Compressing two integers disregarding order

Question

Comparing an ordered pair (x,y) to an unordered pair {x, y} (set), then information theoretically, the difference is only one bit, as whether x comes first or y requires exactly a single bit to represent.

So, if we're given a set {x,y} where x,y are two different 32-bit integers, can we pack them into 63 bits (rather 64)? It should be possible to recover the original 32 bit integers from the 63 bit result, but without being able to recover their order.

D.W. · Accepted Answer · 2016-05-09T19:02:06.380

Yes, one can. If $x<y$, map the set $\{x,y\}$ to the number

$$f(x,y) = y(y-1)/2 + x.$$

It is easy to show that $f$ is bijective, and so this can be uniquely decoded. Also, when $0 \le x < y < 2^{32}$, we have $0 \le f(x,y) < 2^{63} - 2^{31}$, so this maps the set $\{x,y\}$ to a 63-bit number $f(x,y)$. To decode, you can use binary search on $y$, or take a square root: $y$ should be approximately $\lfloor \sqrt{2 f(x,y)} \rfloor$.

filipos · Answer 2 · 2016-05-10T16:56:51.820

As an addition to D.W.'s answer, note that this is a particular case of the Combinatorial Number System, which compactly maps a strictly decreasing sequence of $k$ non-negative integers $c_k > \cdots > c_1$ to $$ N = \sum_{i=1}^k \binom{c_i}{i}.$$

This number has a simple interpretation. If we order these sequences lexicographically, then $N$ counts the number of smaller sequences.

To decode, just assign $c_k$ the largest value such that $\binom{c_k}{k} \leq N$ and decode $N - \binom{c_k}{k}$ as a $(k-1)$-sequence.

Gilles 'SO- stop being evil' · Answer 3 · 2016-05-10T13:25:31.677

The total number of unordered pairs of numbers in a set of $N$ is $N(N+1)/2$. The total number of unordered pairs of distinct numbers is $N(N-1)/2$. It takes $2 \log_2(N) = \log_2(N^2)$ bits to represent an ordered pair of numbers, and if you have one less bit, you can represent elements of a space of up to $N^2/2$. The number of unordered not-necessarily-distinct pairs is slightly more than half the number of ordered pairs so you can't save a bit in the representation; the number of unordered distinct pairs is slightly less than half, so you can save a bit.

For a practical scheme that's easy to compute, with $N$ being a power of 2, you can work on the bitwise representation. Take $a = x \oplus y$ where $\oplus$ is the XOR (bitwise exclusive or) operator. The pair $\{x,y\}$ can be recovered from either $(a, x)$ or $(a, y)$. Now we'll look for a trick to save one bit in the second part, and give a symmetric role to $x$ and $y$ so that the order cannot be recovered. Given the cardinality computation above, we know this scheme will not work in the case where $x=y$.

If $x \ne y$ then there is some bit position where they differ. I'll write $x_i$ for the $i$th bit of $x$ (i.e. $x = \sum_i x_i 2^i$), and likewise for $y$. Let $k$ take the smallest bit position where $x$ and $y$ differ: $k$ is the smallest $i$ such that $x_i \ne y_i$. $k$ is the smallest $i$ such that $a_i = 1$: we can recover $k$ from $a$. Let $b$ be either $x$ or $y$ with the $k$th bit erased (i.e. $b = \sum_{i<k} x_i 2^i + \sum_{i>k} x_i 2^{i-1}$ or $b = \sum_{i<k} y_i 2^i + \sum_{i>k} y_i 2^{i-1}$) — to make the construction symmetric, pick $x$ if $x_k=0$ and $y_k=1$, and pick $y$ if $x_k=1$ and $y_k=0$. Use $(a,b)$ as the compact representation of the pair. The original pair can be recovered by computing the lowest-order bit that is set in $a$, inserting a 0 bit at this position in $b$ (yielding one of $x$ or $y$), and taking the xor of that number with $a$ (yielding the other element of the pair).

In this representation, $a$ can be any nonzero number, and $b$ can be any number with half the range. This is a sanity check: we get exactly the expected number of representations of unordered pairs.

In pseudocode, with ^, &, |, <<, >>, ~ being C-like bitwise operators (xor, and, or, left-shift, right-shift, complement):

encode(x, y) =
  let a = x ^ y
  let k = lowest_set_bit_position(a)
  let low_mask = (1 << k) - 1
  let z = if x & (1 << k) = 0 then x else y
  return (a, (z & low_mask) | (z & ~low_mask) >> 1)
decode(a, b) =
  let k = lowest_set_bit_position(a)
  let low_mask = (1 << k) - 1
  let x = (b & low_mask) | ((b & ~low_mask) << 1)
  return (x, a ^ x)

score 0 · Answer 4 · answered May 10 '16 at 08:25

0

A nonconstructive proof: there are $(2^{32}\times 2^{32} - 2^{32})/2 = 2^{31}(2^{32}-1)<2^{63}$ unordered pairs of different 32-bit integers.

answered May 10 '16 at 08:25

Martín-Blas Pérez Pinilla

127
2

Compressing two integers disregarding order

4 Answers4