39

By inspection I notice that

  • Shifting does not change the standard deviation but change mean. {1,3,4} has the same standard deviation as {11,13,14} for example.

  • Sets with the same (or reversed) sequence of adjacent difference have the same standard deviation. For example, {1,3,4}, {0,2,3}, {0,1,3} have the same standard deviation. But the means are different.

My conjecture: There are no two distinct sets with the same length, mean and standard deviation.

Question

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

Jam
  • 10,632
  • 3
  • 29
  • 45
Display Name
  • 2,713
  • 24
    The famous example would be Anscombe's quartet – shadow Jun 07 '19 at 05:58
  • 12
    Intuition: For a set with, say, $3$ elements, you have $3$ degrees of freedom. Requiring the mean to have a specific value removes one degree of freedom, and requiring the standard deviation to have a specific value removes another degree of freedom. This means that you ought to have a degree of freedom left, which is to say a continuum of possible triples. – Arthur Jun 07 '19 at 14:43
  • Anscombe's is arguably not an exact example here, since the y statistics are approximate and the x values are identical. Maybe one could argue that a slight tweak of the y's could make them perfectly exact though. However, it is definitely a relevant case to be aware of in this context. – jdods Jun 13 '19 at 14:16
  • The Datasaurus Dozen is a beautiful illustration showing that many different data sets can share many statistical indexes at once. – colt_browning Jun 07 '19 at 11:32

9 Answers9

69

Yes. Two sets of numbers has the same mean and the same SD iff their sum and the sum of their squares match.

The set $\{1,2,3\}$ has sum $6$ and squares' sum $14$. The set $\{x,y,z\}$ the same mean and SD iff $$\begin{cases}x+y+z=6\\x^2+y^2+z^2=14\end{cases}$$ This is the intersection of a spherical surface and a sectioning plane, that has certainly infinitely many points.

ajotatxe
  • 66,849
68

$-2,-1,3$ and $-3,1,2$ both have a mean of $0$ and a standard deviation of $\sqrt\frac{14}{3}$.

auscrypt
  • 8,246
55

The example by auscrypt settles the question, but maybe it's worth mentioning why this should be obvious by considering degrees of freedom.

Mean and standard deviation are two quantities. A collection of $m$ real numbers has $m$ degrees of freedom. Specifying the mean and standard deviation removes two degrees of freedom, leaving $m-2$. So as long as $m > 2$, there should still be lots of room to have different sets with the same mean and standard deviation.

EDIT: This was intended as a heuristic, rather than a proof, but rigorous arguments can be made. For example, suppose $A$ and $B$ are two $m$-tuples with the same mean, such that $\sigma(A) > \sigma(B)$ (where $\sigma$ denotes standard deviation), $C$ and $D$ two $m$-tuples with the same mean such that $\sigma(C) < \sigma(D)$. Then for any $t$, $t A + (1-t) C$ and $t B + (1-t) D$ have the same mean, and (by the Intermediate Value Theorem) there exists $t \in [0,1]$ such that they have the same standard deviation. If $A-B$ and $C-D$ are linearly independent these will not be the same.

Robert Israel
  • 470,583
  • 5
    I don't think without any further argument your 'obvious' explanation holds since the cardinalities of $\mathbb{R}^n$ and $\mathbb{R}^k$ are equal for any finite $n, k$ (here $m$ and $2$). – orlp Jun 07 '19 at 06:08
  • 3
    @orlp No operator used in the computation of standard deviations maps ℝ² to ℝ uniquely. – wizzwizz4 Jun 07 '19 at 06:14
  • 1
    @orlp: The mean is a line, not a Hilbert curve. – Kevin Jun 07 '19 at 06:49
  • This answer looks nice but is probably wrong for the reason mentioned by orlp. 1 real number and 1000 real numbers hold the exact same quantity of information, don't they? 1 real number and 1000 real numbers have the exact same degrees of freedom, don't they? – Eric Duminil Jun 07 '19 at 08:35
  • 8
    @EricDuminil: No, degrees of freedom are about dimensionality, not cardinality. Thinking that 1 real number and 1000 real numbers have the same degrees of freedom because $\mathbb{R}$ and $\mathbb{R}^{1000}$ have the same cardinality is like thinking squares of side lengths 1 and 2 have the same area because the sets of points they contain have the same area (although area is about measure, not dimensionality). – user2357112 Jun 07 '19 at 09:07
  • If you know any linear algebra, consider a system of $m$ variables and $2$ linear equations. If this system is solvable, it must have infinitely many solutions, and you have to add $m-2$ more equations to reach a point where you've pinned down a single solution. Degrees of freedom are like that, but for more than linear equations. Not all possible equations, though - if you allow stuff like space-filling curves or digit interleaving, one equation can easily eliminate multiple degrees of freedom. – user2357112 Jun 07 '19 at 09:31
  • 2
    @wizzwizz4 You can add an argument like that to the answer to finish it but without it I feel it's incomplete. – orlp Jun 07 '19 at 11:16
  • 2
    @orlp 's right. A whole dataset can be represented by a single number (say interwoven decimal representations of real values following a unary header indicating how many datapoints). If a statistical quantity involved any operation of this kind (and you need to characterise what this kind is [or isn't] before you can be sure), then no two datasets would have the same value for this statistical quantity. – Dannie Jun 07 '19 at 12:46
  • A dataset of more than one element can be represented by a single number plus one or more rules / formulae, or some other context for interpreting that number. A single number by itself is a single number byitself. – John Bollinger Jun 07 '19 at 14:09
  • Ah, crap, I said "area" when I meant "cardinality" the second time and it's way too late to edit. – user2357112 Jun 07 '19 at 15:48
  • @user2357112 Yea I don't understand why people are talking about cardinality here at all. – Ovi Jun 08 '19 at 19:10
  • @yesterday You can always make a new, corrected comment and then delete the old one. – Steven Alexis Gregory Jun 09 '19 at 12:13
15

Let $A = \{x_1, x_2, \dots, x_n\}$ add up to $n\mu$.

Then $B = \{2\mu-x_1, 2\mu-x_2, \dots, 2\mu-x_n\}$ will also add up to $n\mu$.

We will also have \begin{align} \sum_{i=1}^n (2\mu - x_i)^2 &= 4n\mu^2 -4\mu\sum_{i=1}^n x_i + \sum_{i=1}^n x_i^2\\ &= 4n\mu^2 - 4n\mu^2 + \sum_{i=1}^n x_i^2\\ &= \sum_{i=1}^n x_i^2 \end{align}

Hence the sets $A$ and $B$ have the same mean and standard deviation.

As a side note, the $i^{th}$ and $(n+1-i)^{th}$ rows of many $n\times n$ magic squares, when $n$ is odd, have this property.

  • Can you elaborate on your comment regarding the magic squares? Is this a corollary of your statement? – flawr Jun 07 '19 at 09:07
  • @flawr Not always, but sometimes, the rows of a magic square look like sets A and B. A simple example is the magic square $\begin{array}{c} 8 & 3 & 4\ 1 & 5 & 9\ 6 & 7 & 2 \end{array}$, $8+2=3+7=4+6=10$. Also \begin{array}{r} 23& 6& 19& 2& 15 \ 10& 18& 1& 14& 22 \ 17& 5& 13& 21& 9 \ 4& 12& 25& 8& 16 \ 11& 24& 7& 20& 3 \end{array} – Steven Alexis Gregory Jun 07 '19 at 14:11
7

You seem to like the set $\{1,3,4\}$. Here are six more three element sets having the same mean and standard deviation as your given set. \begin{align*} &\frac{202}{171} & &\frac{1}{171} \left(583+\sqrt{19842}\right) & &\frac{1}{171} \left(583-\sqrt{19842}\right) \\ &\frac{688}{171} & &\frac{1}{171} \left(340+\sqrt{27861}\right) & &\frac{1}{171} \left(340-\sqrt{27861}\right) \\ &\frac{544}{171} & &\frac{1}{171} \left(412-\sqrt{62421}\right) & &\frac{1}{171} \left(412+\sqrt{62421}\right) \\ &\frac{32}{19} & &\frac{1}{19} \left(60-\sqrt{581}\right) & &\frac{1}{19} \left(60+\sqrt{581}\right) \\ &\frac{27}{19} & &\frac{1}{38} \left(125-\sqrt{1689}\right) & &\frac{1}{38} \left(125+\sqrt{1689}\right) \\ &-\frac{2}{3} \left(-4+\sqrt{7}\right) & &\frac{1}{6} \left(16+2 \sqrt{7}\right) & &\frac{1}{3} \left(8+\sqrt{7}\right) \end{align*} and, more generally, let $(x,y)$ be any point on the ellipse given by the equation $$ x^2 + xy + y^2 -8x -8y + 19 = 0 $$ and set $z = 8 - x - y$. This triple of values has the same mean and standard deviation as does $\{1,3,4\}$. (This is found by eliminating $z$ from the system mean$(x,y,z) = {}$mean$(1,3,4)$ and stddev$(x,y,z) = {}$stddev$(1,3,4)$, i.e., $x+y+z = 8, x^2 + y^2 + z^2 - xy - xz - yz = 7$.)

Eric Towers
  • 70,953
6

There are an infinite number of 3 element real number sets with any given real mean and any given positive real standard diviation.

Without loss of generality lets assume that $\mu = 0$ and $\sigma = 1$. Once we have a soloution set for these parameters we can find one for any parameters by scaling it to get the desired standard deviation, then shifting it to get the desired mean

$$x + y + z = 0$$

$$\frac{x^2 + y^2 + z^2}{3} - 0 = 1$$

$$x^2 + y^2 + z^2 = 3$$

We have two equations and 3 unknowns, so lets treat x as a parameter and solve for y and z.

$$y = -x -z$$

$$x^2 + (-x -z)^2 + z^2 = 3$$

$$x^2 + (x^2 + 2xz +z^2) + z^2 = 3$$

$$2z^2 + 2xz + (2x^2 -3) = 0$$

$$z = \frac{-2x\pm\sqrt{(2x)^2-8(2x^2-3)}}{4}$$

$$z = \frac{-2x\pm\sqrt{4x^2-16x^2+24}}{4}$$

$$z = \frac{-2x\pm\sqrt{-12x^2+24}}{4}$$

For this to have real soloutions we need the following inequality to hold.

$$-12x^2+24 \geqslant 0$$

$$12x^2 \leqslant 24 $$

$$x^2 \leqslant 2 $$

$$- \sqrt{2} \leqslant x \leqslant \sqrt{2} $$

There are an infinite number of $x$ values that satisfy this inequality, therefore there are an infinite number of sets of 3 real numbers with $\mu = 0$ and $\sigma = 1$ therefore there are an infinite number of sets of three real numbers with any given real mean and any given positive* real standard deviation.

* Negative standard diviations don't make any sense, and a zero standard deviation means all numbers are equal.

Peter Green
  • 1,265
6

A simple way to find a counterexample to the conjecture is to focus on sets whose values are symmetrical about zero. This ensures that the two sets have the same mean, and also simplifies calculation of their standard deviations.

Let $a,b$ be any real numbers. Then the set $\{a,b,-a,-b\}$ has mean zero and SD $\sqrt{(2a^2+2b^2)/4}$. Now let $c$ be any real number not equal to any member of that set and such that $c^2 < a^2+b^2$. Let $d$ be given by:

$$d = \sqrt{a^2 + b^2 - c^2}$$

implying $a^2 + b^2 = c^2 + d^2$. Then the set $\{c,d,-c,-d\}$ has the same mean and SD.

Adam Bailey
  • 4,735
0

It is possible, and a classical example that uses this fact to illustrate that simple descriptive statistics like mean and standard deviation can mislead you, is the Anscombe's quartet. It comprises four sets of eleven (x,y) points that have the same: mean and standard deviation (in x and y), same correlation, same linear regression parameters, and same R-squared, yet are qualitatively very different.

andrepd
  • 443
0
    /*
     * you can generate samples as fibonacci series for any given mu and sigma as follows.
     */
    static double[] fibonacci_samples(int n, double mu, double sigma)
    {
        double[] x= new double[n];
        double shift = mu;
        double scale = Math.Sqrt(
            n * sigma * sigma 
            / (
                (1 - fibonacci(n + 1)) * (1 - fibonacci(n + 1)) * (1 + fibonacci(n - 1) * fibonacci(n - 2))
                + 2 * (1 - fibonacci(n + 1)) * fibonacci(n) * (fibonacci(n - 1) * fibonacci(n - 1) - (n + 1) % 2)
                + fibonacci(n) * fibonacci(n) * fibonacci(n) * fibonacci(n - 1))
            );

        double min = (1 - fibonacci(n + 1));
        double max = fibonacci(n) ;

        for (int i=0; i < n; i++)
        {
            x[i] = shift + min * scale;
            max = max + min;
            min = max - min;
        }
        return x;
    }