33

We all know $a^2 + b^2 = c^2$ on a right angle triangle. Yes. It works. It can be proven using area of square and everything. But my question is: why?

What makes the number 2 so special, that it defines the 2-norm, gets involved everywhere in vector space, and even statistics when we talk about independent random variables are almost like perpendicular vectors and their variance can simply. (The last part might be a bit of a stretch on my part.)

I know this feels almost like asking why gravity is 9.8. It just is. It is not because 9.8 is a perfect number in number theory or anything. But our insistence on using the square has to come from somewhere. To me, the only natural connection of 2 or square of anything must be linked to the area of square on a flat surface. Why isn't it $a^4 + b^4 = c^4$? Is that a worse world to live in? In other words, what is the first principle reason that the square law comes out so often if the only first principle meaning is something multiply by itself?

This question comes from my long history of not understanding why statistics often use the standard deviation and not the average distance. Some people say because we want to emphasize the outliers so variance is the root cause (but why?) and some people say standard deviation has nice properties (precisely because of the 2 in the square law). In a way, I feel that it all comes down to the Pythagorean theorem. But why?

In the end, I can only comfortably establish that square is something multiplied by itself, or area of square on a flat surface. But what does length of sides on a right-angle triangle has to do with them? Any explanation other than: it just works? Why Pythagorean theorem works, that no matter the dimension, if we are interested in the distance, we still square them?

Deren Liu
  • 507
  • 11
    Relevant: https://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem – Shaun Sep 10 '24 at 18:23
  • Somewhat relevant is the situation with inverse square laws in physics -- see Why are so many forces explainable using inverse squares when space is three dimensional? AND Intuitive explanation of the inverse square power $\frac{1}{r^2}$ in Newton's law of gravity. Except in the physics case there are specific aspects of our universe $(3$ space dimensions) that can be used to provide fairly convincing explanations. (continued) – Dave L. Renfro Sep 10 '24 at 18:31
  • 6
    I can't think of anything that convincing in math, but to give another example, among the $L^p$ spaces the $p=2$ case is rather special. (This example is mostly for others with the appropriate background who are reading this.) – Dave L. Renfro Sep 10 '24 at 18:31
  • 4
    It certainly isn't true in all versions of geometry, just Euclidean. – Thomas Andrews Sep 10 '24 at 18:45
  • 5
    A guess: I think fundamentally it is because orthogonality is a $2$-ary relation, and the axioms of Euclidean geometry suggest that projection is linear. Hence this $2$-ary relation is induced by a bilinear form (namely the inner product, which can be thought of as the signed product of the lengths of vectors after projecting one onto the other). This seems to be where the exponent of $2$ comes from. –  Sep 10 '24 at 18:52
  • 18
    $9.8$ and gravitation is a really bad example: it's only approximate, it's dependent on your choice of units, and it's dependent on the mass of the Earth and your location. – Robert Israel Sep 10 '24 at 22:12
  • 2
    When you say, "But my question is: why?", what I hear is, "Show me a proof!". Proofs show why, and should give meaningful comprehension. If it hasn't clicked for you yet, look for a different proof, or think about the one you've received more rigorously. – Daniel R. Collins Sep 11 '24 at 03:22
  • 3 is a nice number as well! ;-) – Peter - Reinstate Monica Sep 11 '24 at 10:47
  • 1
    The Pythagorean theorem deals with surfaces, which are calculated using a simple multiplication, which is in fact just a second power :-) – Dominique Sep 11 '24 at 11:36
  • 1
    @DanielR.Collins It could also mean "oh, I've seen one or more proofs it's true all right, but I need a reason (not necessarily a proof) why it's not accidental". – J.G. Sep 11 '24 at 12:14
  • 1
    "I know this feels almost like asking why gravity is 9.8." If it feels that way to you, it's because you have not yet internalized the difference between mathematics and physics. – David K Sep 11 '24 at 13:47
  • 1
    @J.G. Yes exactly, I think proofs show true, not why. In other words, I think proofs show how something IS true, not how something SHOULD be true. I mentioned 9.8 in gravity as an example of something accidental. It is a result of how we have defined our units and how the right distance from sun gave birth to us for us to be asking questions in the first place. But I believe that in math this 2 is something more than accidental. Hence me here asking about it. – Deren Liu Sep 11 '24 at 13:54
  • I too have been wondering about the origin of the $2$, since there are other settings such as $p$-adic numbers where that "special number" is $\infty$ (in the sense that the $\infty$-norm has analogous special properties as the Euclidean norm in the real case). I asked a question about it here. – pregunton Sep 11 '24 at 14:08
  • @DerenLiu I wouldn't connect the strength of gravity on Earth to our being in the Sun's habitable zone. If it has any anthropic explanation, it lies in how much gravity we need from our home planet. – J.G. Sep 11 '24 at 14:42
  • And "how much gravity we need" doubtless also defines another "habitable zone" that includes more than just the constant $9.8\ \mathrm m/\mathrm s^2$. Even on Earth the gravity isn't exactly that much in most places (see https://en.wikipedia.org/wiki/Gravity_of_Earth#Variation_in_magnitude), yet here we are. – David K Sep 11 '24 at 15:53
  • @J.G. You are right. It's a bit of a stretch to involve sun here. Actually I remember reading somewhere this is one of the things Kepler did in his late years: Trying to understand why the distance from the sun is exactly that number, or something like that. – Deren Liu Sep 11 '24 at 17:11
  • 1
    The OP talked about Standard Deviation and average distance. I was never into Statistics, but in Maths its in the same area as Least Squares when finding the best fit linear graph. The reason you do not use modulus in that is because the Least Squares stuff relies on the function you use to be continuously differentiable. The mod function is not with a major discontinuity at zero, the very place you are most interested in. So the next best thing is the square function. A draw back, and not a bonus as you suggest, is outliers have too much influence on the result and in many cases that is bad. – Rewind Sep 11 '24 at 20:39
  • I would like to point you also to my question https://math.stackexchange.com/questions/4943792/simplicial-generalization-of-pythagoras and the answer given therein: it notes that it is related to the Gram determinant, and a determinant clearly is a bilinear form, thus pointing out the speciality of 2 wrt. Pythagoras. – Dr. Richard Klitzing Sep 28 '24 at 07:44
  • Isn't that more a fact, found by co-incidence, than anything else?

    Are you suggesting the Pythagorean might work with another value in place of 2?

    – Robbie Goodwin Sep 28 '24 at 19:50

6 Answers6

41

This is too long for a comment but points out the special role of $2$.

The usual distance formula states the distance between two points $x = (x_1,x_2)$ and $y = (y_1,y_2)$ is $$d(x,y) = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}$$

For any $p \ge 1$ you can similarly define a $p$-distance $$d_p(x,y) = \sqrt[p]{|x_1 - x_2|^p + |y_1 - y_2|^p}$$ and also $$d_\infty(x,y) = \max\{|x_1 - x_2|, |y_1 - y_2|\}.$$ It takes a bit of work, but you can show that each $d_p$ is in fact a metric, so that it satisfies the triangle inequality $$d_p(x,y) \le d_p(x,z) + d_p(y,z)$$ for any three points $x,y,z$.

If $C$ is a smooth curve in the plane you can use $d_p$ to define an arclength $\ell_p(C)$. Let $C_p$ denote the unit circle in the $p$-metric: $$C_p = \{(x_1,x_2) : |x_1|^p + |x_2|^p = 1\}.$$ The perimeter of $C_p$ is $\ell_p(C_p)$, and the diameter of $C_p$ can be calculated using $d_p$. This gives a definition of $\pi$ that depends on $p$: $$\pi_p = \frac{\ell_p(C_p)}{\mathrm{diam}\ C_p}$$

What is so special about $p=2$? Try to show that $$\pi_2 = \min_{1 \le p \le \infty} \pi_p$$ and $p \not= 2$ implies $\pi_p > \pi_2$. That is, $\pi_2$ is the unique minimum value of $\pi_p$.

Shaun
  • 47,747
Umberto P.
  • 54,204
  • 11
    There is a nice book which deals with $\pi_p$. It is called "Squigonometry: The Study of Imperfect Circles" by Poodiack and Wood – Davide Masi Sep 10 '24 at 18:55
  • What prevents there being a p-distance for p < 1? Is it just that it can become a multi-valued function? – Hearth Sep 11 '24 at 03:28
  • 3
    @Hearth: p<1 doesn't satisfy the triangle inequality. – user2357112 Sep 11 '24 at 03:39
  • This makes number 2 even more special! My confusion is thus even deeper. Why 2? Is it because we live in flat space? It has to be because we are already limited in a provincial space/experience that all our findings about 2 and square come out so naturally – Deren Liu Sep 11 '24 at 06:43
  • 11
    I'd also note that 2-distance is invariant under rotation, which seems more relevant to me than the minimality of $\pi$. – Rad80 Sep 11 '24 at 09:31
  • 5
    @DerenLiu: Regarding your comment My confusion is thus even deeper, if an explanation of the special nature of $2$, such as the one given in this answer, has the effect of deepening your confusion, would every explanation of the even more special nature of $2$ thus deepen your confusion even further? If so, answering your question feels like a fruitless task. What are you really asking? – Lee Mosher Sep 11 '24 at 15:07
  • @Rad80 It seems to me that the two facts are related. – David K Sep 11 '24 at 15:55
  • 4
    @LeeMosher One issue is that the dots aren't connected. The fact that $\pi_2$ is special doesn't seem to connect up with the other 2-norm facts listed. Was it supposed to be an explanation that ties everything together, or was it simply "Yes, 2 is special, here's another example! Is there an underlying explanation? Beats us!" -- I'm guessing the "deeper confusion" is due to interpreting the 'answer' in the latter sense. – R.M. Sep 11 '24 at 15:56
  • @LeeMosher yes as R.M. pointed out, I am asking why special, not how special. Showing more examples in this context is not the same as providing explanation for any. Square is multiplying by itself once, period. The fact that it comes out everywhere necessitates a connection that connect all the dots back to the fundamental first principle idea of square, or 2. – Deren Liu Sep 11 '24 at 17:03
  • @DerenLiu: In that case I do not understand how this question could possibly be answered. As it says on the tour, this is a site to Get answers to practical, detailed questions, which rules out something as vague as "But why?" – Lee Mosher Sep 11 '24 at 19:19
  • Fermat laughs. Venture beyond 2 and the world dissolves. || re "... or area of square on a flat surface. But what does length of sides on a right-angle triangle has to do with them?",--> The "proof" of the Pythagorean 'rule' is taught as part of high school maths. || Fermat's lst theorum" There are NO solutions for A^n + B^n = C n for n integer and > 2. – Russell McMahon Sep 24 '24 at 07:54
28

There's a lot of ways of looking at it. One answer is that it comes from inner products (aka dot products, aka scalar products). The Pythagorean Theorem holds true in any space where inner products exist. It works in the plane, in space, in four-dimensional hyperspace, even in weird constructs like Hilbert spaces. All you need is an inner product.

Let $ \mathbf a, \mathbf b$ and $\mathbf c$ be vectors with $\mathbf a + \mathbf b = \mathbf c$. Then we have:

$$ \mathbf a + \mathbf b = \mathbf c $$ $$ (\mathbf a + \mathbf b) \cdot (\mathbf a + \mathbf b) = \mathbf c \cdot \mathbf c $$ $$ \mathbf a \cdot (\mathbf a + \mathbf b) + \mathbf b \cdot (\mathbf a + \mathbf b) = \mathbf c \cdot \mathbf c $$ $$ \mathbf a \cdot \mathbf a + 2 (\mathbf a \cdot \mathbf b) + \mathbf b \cdot \mathbf b = \mathbf c \cdot \mathbf c $$

When $\mathbf a$ and $\mathbf b$ are perpendicular, we have $ \mathbf a \cdot \mathbf b = 0$, and this becomes just: $$ \mathbf a \cdot \mathbf a + \mathbf b \cdot \mathbf b = \mathbf c \cdot \mathbf c $$

and we are done.


So it needs squares because inner products are the product of two vectors. It doesn't work with cubes, because there's no good way to multiply three vectors together - you can try, but with inner products you'll find that in general $(\mathbf a \cdot \mathbf a) \cdot \mathbf b \neq \mathbf a \cdot (\mathbf a \cdot \mathbf b)$, so the two sides of the equation don't cancel out.

Toph
  • 1,566
  • 7
  • 17
  • Consider $d_2(a,b)=Constant$ as definition of circle. If you use a different metric e.g. $d_1$ r $d_3$, the "circle" will get squishy, either shorter or longer around the X,Y axes. The deep reason it's good to derive thes via vectors is vectors describe "true" geometry that's invariant wrt. choice of coordinate axes. – Beni Cherniavsky-Paskin Sep 11 '24 at 03:59
  • 1
    What about non-commutative multiplication? Where a*b does not equal b*a? – No Name Sep 11 '24 at 18:49
  • 2
    @NoName: The dot product is commutative in Euclidean geometry, and can be defined in terms that do not depend on coordinates or the Pythagorean theorem (specifically, you project one vector onto the other, and multiply the resulting lengths). I'm not sure how feasible it would be to come up with a vector space in which the analogue of the dot product is non-commutative, but it probably does not resemble Euclidean space very closely. – Kevin Sep 14 '24 at 00:40
  • 1
    @NoName How do you define orthogonality in a space that has non-commutative multiplication? If "perpendicular vectors" means "$\mathbf a \cdot \mathbf b = \mathbf b \cdot \mathbf a = 0$", then Pythagoras still works even if inner products are non-commutative in general! For example, you can make $\mathbb{C}^n$ an inner product space with the inner product $\mathbf a \cdot \mathbf b = \sum_i \mathbf a_i\overline{\mathbf b_i}$. Then $\mathbf a \cdot \mathbf b$ is the complex conjugate of $\mathbf b \cdot \mathbf a$, so they may not be equal in general but they are both zero if one is. – Toph Sep 14 '24 at 16:41
24

For what it is worth: the exponent $2$ exhibits isotropy.

Consider the position vector $(x,y,z)$, and the locus of the points at equal distance of the origin, which form a spherical surface, of implicit equation

$$x^2+y^2+z^2=d^2.$$

The normal to this surface is given by the gradient $2(x,y,z)$, which is aligned with the position vector. This does not occur with other exponents.

enter image description here

21

Your "why this, not that" question thinks too small. A weaker version of the Pythagorean theorem states a function $f$ exists for which each right-angled triangle in a Euclidean plane has catheti $a,\,b$ and hypotenuse $c$ satisfying $f(a)+f(b)=f(c)$. That such an $f$ exists is already remarkable enough, but even if you accept it does the question shouldn't be "why can it be taken as $x^2$ (or $2x^2$ etc.) rather than $x^p$ for some $p\ne2$?", but "why can it be taken as $x^2$ as opposed to just about anything else?"

We'll come back to that. The greatest length $d$ of an $n$-dimensional hypercuboid of side lengths $x_i$ satisfies $\sum_{i=1}^nf(x_i)=f(d)$. To prove this by induction, note in the case $n=k+1$ this diagonal ia a hypotenuse of a triangle whose catheti are an edge and the longest diagonal of a hypercuboid orthogonal to that edge, in a subspace of dimension $k$. So the $2$-dimensional case is the only hard part.

Let's finally return to it. Drop a perpendicular from the hypotenuse to the opposite vertex. By angle-chasing, this splits the triangle in to two right-angled triangles similar to the original, of hypotenuses $a,\,b$. Since area is in $2$ dimensions, it scales as side squared. Since the original area has been split into two pieces, $f(x)=x^2$ can be taken.

So the short answer to your question is that $\sum_{i=1}^nf(x_i)+f(x_{n+1})$ specifies the inductive step in terms of the case $n=2$ (which is also the induction's base case), and in that case we can prove the additive quantity I've denoted $f$ is an area, and it's therefore quadratic.

Note that this proof of the planar Pythagorean theorem says "areas are additive, therefore squared sides are additive". Usually, people prove the second part however they like, then note the area implication. The above argument, which I've seen attributed to Einstein, reverses this, and finds a shockingly simple proof areas are additive, which is why it's among my favourite proofs of the theorem.

And if you still don't think these points are fundamental enough, the argument for the primacy of $n=2$ - that it makes the inductive step work - can be restated as addition being a binary operation we generalize to $n$ addends. This "addition has $2$ arguments, therefore use $p=2$" argument is very close to @OrangeMushroom's comment's point.

J.G.
  • 118,053
10

Partial answer.

The Pythagorean Theorem is a theorem about areas of squares. You can prove it without squaring anything:

enter image description here

The exponent $2$ appears because the areas of similar figures are proportional to the squares of any linear dimension. That's a fundamental property of area, not an arbitrary choice mathematicians made.

Here is Euclid's proof.

You may find this answer helpful: Is Pythagoras' Theorem a theorem?

(Image from https://www.nagwa.com/en/lessons/436176831673/.)

Ethan Bolker
  • 103,433
  • 3
    I have no clue what that "visual proof" is trying to argue. It's unclear how the cuts to the $a^2$ figure were made and why they correspond as they do to the copies in the $c^2$ figure. – Benjamin Kuykendall Sep 11 '24 at 03:11
  • @Benjamin Kuykendall - I don't know how those cuts were figured out. What they do is show that the area of the black square added to the area of the multicolor square is the area of the hypotenuse square. The cuts were made strangely so that when the shapes are re-assembled that way, it's still a square. – user78090 Sep 11 '24 at 03:31
  • 3
    @user78090 yeah I figured that... but that's not sufficient for a proof! Although it appears that, for example, the two copies of the green trapezoid are identical, you can't really know for sure without further justification. It's also unclear if/why this would work for arbitrary right triangles, not just the one pictured. – Benjamin Kuykendall Sep 11 '24 at 04:01
  • I prefer this dissection: https://newtonexcelbach.files.wordpress.com/2012/02/pythanim1a.gif – PM 2Ring Sep 11 '24 at 06:46
  • 1
    @BenjaminKuykendall The cuts to the bottom square are by lines through its center parallel and perpendicular to the nypoenuse of the triangle. It's not hard to "know for sure" that the pieces assemble as they should using easy theorems on parallel lines. – Ethan Bolker Sep 11 '24 at 11:15
  • Each of the four identical colored shapes has a short edge and a long edge. They meet at a right angle. The cuts were made so that the long edge minus the short edge is equal to the edge of the black triangle. You can verify this by looking at the lengths in biggest square. – MiguelMunoz Sep 12 '24 at 06:14
0

You question why three consecutive displacements made so as to lead back to the starting point, where the first two them are orthogonal to each other, have a relationship that is of power 2 and additive.

You wonder at why the spread measure definition commonly used in statistics uses a power 2 function of error rather than a power 1 function.

Need there be an underlying reason for these (and perhaps other) math facts that will meet your need for "mathematical integrity" or whatever ?

If so, why ?

What are you really looking for here ?

I see no strong connection between statistical variance and Pythagoras' theorem.

My own understanding of the definition of statistical variance was that, when defined thus, it handled spread on either side of the mean additively rather than having error on one side of the target cancel out error on the opposite side. One could achieve the same freedom from error cancellation by using the mean of the sum of absolute differences but squaring the differences also fitted in well with the mathematical form of various commonly encountered distributions, esp. the Normal/Gaussian distribution. This was essentially what Martin Gardner said in one of his books that I read as a 17 year old and what an odd professor may have intimated in class later.

I believe other definitions of spread can be deployed but do not lead to such compact expressions in their deductions.

Nobody wants to discourage mathematical intuition. But we must always be careful not to see more in something than is actually there. Wait for such things to emerge more strongly by themselves.

Trunk
  • 442
  • 1
    It's not just that squaring the differences fit well with particular distributions: it greatly simplifies common calculations on all random variables, because if you use variance as the measure of dispersion, you find that dispersion has the same linearity properties that expectation does. $Var(X+Y)=Var(X)+Var(Y)$. The downside of variance is that outliers are more likely to mess up your estimates. In other words, your last point (about compact expressions) is really why variance is so popular. – David K Sep 11 '24 at 16:08
  • 2
    Before someone points out correlated variables have a cross term, that generalization of @DavidK's equation is basically the cosine rule. – J.G. Sep 12 '24 at 05:37
  • 1
    @J.G. Right, I misspoke about the "same" linearity properties: unlike with expectation, you also have to consider correlation. But we have many applications where we can consider the variables independent, and then the correlation is zero. – David K Sep 12 '24 at 13:45