Currently in a probability class and am having a hard time wrapping my head around the standard deviation. In many ways it feels "made up" to me; let me explain:
$$\sqrt{a^2 + b^2} \neq a + b$$
So unless it's used relative to other standard deviations, I feel like it wouldn't inherently mean anything. By that I mean, measurements like meters/inches have some grounding in "reality." Knowing something has a length of 3 m is useful information even if I don't have any other lengths in meters to compare it to. But when I see that a distribution has a SD of 2, that information only translates to my understanding of how "spread" the distribution is when I compare it to other distributions with different SDs. I think, "Oh distribution 1 is pretty spread out and it has an SD of ___ so this other SD must mean the corresponding distribution 2 is less/more spread out." Is this just how it is? Are some units of measurement just less intuitive and derive meaning from comparison?
To me, it would make more sense to take the mean of the absolute values of all the deviations from the mean, but my professor told me that SD just happens to be more useful. I don't doubt this, and I am aware of the many properties that makes SD more useful, but my question is why? What causes SD to have those properties?
For example, what causes most of the points in a normal distribution to land within 2 SD of the mean? It can't be just a coincidence or happenstance; there must be a reason.
Any help would be appreciated. Thanks in advance!