2

Let $X$ be a finite multisubset of $\mathbb{N}^2$. Let's introduce the following notation:

$A$ is a set of all first elements of pairs from $X$ and $B$ is a set of all second elements of pairs from $X$.

$cnt_1(n) = |\{(n, b)|(n, b) \in X\}|$ (that is a number of elements in a corresponding set).

$cnt_2(n) = |\{(a, n)|(a, n) \in X\}|$

$sum_1(n) = \sum_{(n,b) \in X}b$

$sum_2(n) = \sum_{(a,n) \in X}a$

$Y=\{(a, cnt_1(a), sum_1(a))|a \in A\}$

$Z=\{(b, cnt_2(b), sum_2(b))|b \in B\}$

The question: Given only $Y$ and $Z$ is it always possible to unambiguously reconstruct $X$?


Equivalently, phrased in terms of SQL: Given an (unordered) set of 2-tuples (X) of natural numbers:

CREATE TABLE X (
    a int,
    b int
);

the following statistical summaries and are derived:

Y:

SELECT a, COUNT(1) AS ca, SUM(b) AS bs FROM X GROUP BY a ORDER BY a;

Z:

SELECT b, COUNT(1) AS cb, SUM(a) AS as FROM X GROUP BY b ORDER BY b;

Is it possible to unambiguously re-construct X when knowing only Y and Z?

D.W.
  • 167,959
  • 22
  • 232
  • 500
Tobias Hermann
  • 247
  • 2
  • 7

1 Answers1

2

The answer is no . I found it helpful to think of the information in the form of a picture: mark each pair $(a, b)$ as a point on a grid, then in each column and row write both the count of the entries in that row, and the "sum" of the entries (the sum of the 'other' coordinate). Your example may be pictured as follows:

enter image description here

I'm writing the count of the dots in each row/column as the number closest the the column, and the "sum" as the number further away. The original data is the set of dots, the new data are Y and Z (understood to not just be sets of points, but to be positional: for example we remember that in the $Y$ data we have $(2, 3)$ in column $1$, $(1, 3)$ in column 2, and so on).

Here are two configurations of points with equal $Y$ and $Z$ data:

enter image description here

Hence the operation is not reversible. Per-column ambiguity arises when there are multiple integer partitions with the prescribed number of parts: eg if a column counts to 2 and sums to 5, its possible entries are 1 + 4 or 2 + 3.

Joppy
  • 266
  • 1
  • 6