4

One of the "axioms of probability", or standard definitions from measure theory, is that the probability measure is countably additive over disjoint unions, i.e. if $P: \mathcal{F} \mapsto[0,1]$ is to be a probability measure, and $A_1,A_2,... \in \mathcal{F}$ so that $A_i \cap A_j = \emptyset$ for all $i \ne j$, then we must have $$ P(\cup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i). $$ Of course this assumption along with the assumption that the probability of the sample space is one will imply finite additivity over unions of disjoint sets, which is more natural and easier to understand I argue, because in many cases, for instance if $\mathcal{F}$ is finite, we could check this assumption on any function $P$ manually. Assuming countably infinite additivity seems to be a big leap.

When discussing this with e.g. students who are in their first year of a math major (and nowhere near measure theory), I often justify the importance of this assumption by saying that it implies a certain "consistency" of $P(\cdot)$. For instance if we were to rewrite any set $A = \cup_i A_i = \cup_i B_i$, where $A_i$ and $B_i$ are different, disjoint, partitions of $A$, countable additivity will imply that $$ \sum_{i=1}^\infty P(A_i) = P(A) = \sum_{i=1}^\infty P(B_i). $$ In otherwords the probability of $A$ will not depend on how we chop up the set $A$. Since in many simple cases, e.g. probabilities on subsets of $[0,1]$ or drawing a venn diagram in the plane, it is easy to imagine countable ways of chopping up sets into disjoint pieces, this seems like an important thing to assume. This also helps reinforce the comparison of probability to "volume" or "area", which of course is the concept that measure theory aims capture.

Another important consequence of this is ``continuity of a probability measure", e.g. if $A_1 \subseteq A_2 \cdots $ and $B = \cup_i A_i$ then $$ \lim_{n\to \infty }P(A_n) = P(B), $$ which is extremely useful. But this later fact is well beyond what would justify this assumption to a beginner.

So now comes the question: what argument would you give to a beginner to explain why the assumption of countable additivity is worth the "cost"? What obvious drawbacks are there to only assuming finite additivity that would convince a (novice) skeptic that it would not lead to nice probability theory?

LostStatistician18
  • 2,361
  • 8
  • 19
  • 1
    Your use of the phrase "infinite additivity" multiple times is unfortunate: saying instead countable additivity would emphasize what aspect of summing with infinitely many events is being allowed. Calling this "infinite additivity" can sound to a student like you'd even be allowing sums over individual points in $[0,1]$. – KCd Jun 26 '25 at 17:11
  • Honestly, the continuity argument. Someone should put in front of them the idea that a lot of the 'niceness' conditions that we use are about swapping two things around. Here its taking the limit and calculating hte probability. Continuity its taking the limit and evaluating the function. This is a key idea imo. – user24142 Jun 26 '25 at 17:12
  • @KCd I changed to "countable" thanks for the suggestion! – LostStatistician18 Jun 26 '25 at 17:14

1 Answers1

2

In the presence of countable additivity, a probability measure on $\mathbb{N}$ is the same thing as a sequence $p_i$ of non-negative reals summing to $1$, which I think is pretty intuitive. Each natural number has a weight, it's pretty easy to visualize what is happening here.

If we only assume finite additivity, there are finitely additive probability measures on $\mathbb{N}$ coming from non-principal ultrafilters. These have the unusual property that the measure of every point and hence of every finite subset of $\mathbb{N}$ is zero! Nevertheless it is still true that the measure of $\mathbb{N}$ itself is $1$. Natural density also behaves like this, but natural density has the intuitive feature that on an arithmetic progression with common difference $d$ it takes the value $\frac{1}{d}$. An ultrafilter will take either the value $0$ or $1$!

This is much less intuitive; the probability mass does not appear to be "located anywhere," and it also does not appear to be "spread out." There is not (as far as I know) any way to visualize what's going on here. We can't write these things down without the axiom of choice, either. It is still possible to make sense of concepts like integrating against one of these measures, but it's weird stuff and doesn't really feel like normal probability. And of course theorems requiring countable additivity won't apply.

This is fairly technical but at the level where you don't want to discuss continuity everything is going to be fairly technical. If I just wanted to explain why infinite additivity corresponds to something familiar to a beginner I'd just give them an example: flipping a coin until you get heads. You can probably convince a beginner that

  • the probability of getting heads immediately is $\frac{1}{2}$,
  • the probability of getting tails, then heads, is $\frac{1}{4}$,
  • the probability of getting tails, tails, then heads, is $\frac{1}{8}$,

and so forth. If we make the very reasonable assumption that we must (a.s.) flip heads eventually (or more precisely that the event of never flipping heads has probability $0$), adding up all these probabilities gives us the sum of a geometric series

$$1 = \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots.$$

But we need countable additivity to run this argument.

Qiaochu Yuan
  • 468,795
  • Awesome thanks for the answer @Qiaochu Yuan! What do you think of my explanation regarding "consistency of $P(\cdot)$"? Clearly your pathological example is a nice reason to keep countable additivity, and this is one where it seems the whole will not be the sum of its parts. – LostStatistician18 Jun 26 '25 at 17:12
  • @LostStatistician: I don't understand this argument. If we only assume finite additivity we still get this "consistency" condition, we just have to restrict to only chopping sets up into finitely many pieces. The whole difficulty appears to me to be justifying why we care about chopping sets up into countably (but not uncountably) many pieces. – Qiaochu Yuan Jun 26 '25 at 17:19
  • I guess my main "pedagogical" point is that since in many cases we could easily imagine chopping something into countably many pieces, we would want the consistency to hold in that case as well, i.e. we would not want the probability of the event to depend on "how it is careved up". Although perhaps this is no better than saying that we want $P(\cdot)$ to behave like area/volume so that the whole is the sum of the parts. – LostStatistician18 Jun 26 '25 at 17:21
  • 1
    Philosophically and intuitively, the justification for countable additivity is quite weak. Kolmogorov himself thought that it is almost impossible to elucidate its empirical meaning as we observe only finite fields of probability, and that while arbitrary it was expedient. Bruno de Finnetti and Savage convinced me long ago of the benefits of finite additivity. For a recent paper comparing finite, countable and complete additivity, see Seidenfeld et al – Jayanth R Varma Jun 27 '25 at 08:04