What is The Basis for The Similarity Of Distance And Standard Deviation?

Question

The common way to think about distance is Minkowski difference with the special case $r=2$. The definition for standard deviation is $ \sqrt{\frac{1}{n-1} \sum (x_i - x)^2}$. Both of these definitions involve raising components to a power, doing an operation, and raising the result to the inverse of the components' power. There is something special about 2 because it corresponds to the natural world. Distance is as it is. What is special?

I welcome anybody to recategorize this question. I don't really know what branch of mathematics comments on stuff like this.

score 0 · Accepted Answer · edited Apr 13 '17 at 12:19

This question is similar to Motivation behind standard deviation? but since you specifically ask for a parallel between standard deviation and distance, here goes.

The Euclidean distance, unlike for example Manhattan distance, is compatible with an inner product. The inner product of two vectors $x,y$ is $x\cdot y = \sum x_i y_i$ (also known as the dot product, etc), and the Euclidean norm is related to it via $\|x\|^2=x\cdot x$. The inner product gives us a concept of orthogonal vectors: $x\perp y$ if $x\cdot y=0$. A quick computation shows that when $x$ and $y$ are orthogonal, $$\|x+y\|^2 = \|x\|^2+\|y\|^2 \tag{1}$$ the general form of Pythagorean theorem. A key feature of (1) is that we can compute the norm of $x+y$ only knowing the norms of $x$ and $y$, not the vectors themselves.

Moving to probability: vectors are replaced by samples, inner product $\sum x_i y_i$ is now called covariance (closely related to correlation coefficient), and independent random variables are found to be orthogonal. We want to take advantage of this orthogonality when working with sums of independent random variables (this is a problem that comes up all the time). So, an analog of Euclidean norm for random variables should be introduced, and this is what the standard deviation is (except for subtraction of the mean). Thanks to (1), we can compute the standard deviation of a sum of independent variables only knowing the standard deviation of each summand.

What is The Basis for The Similarity Of Distance And Standard Deviation?

1 Answers1