Intuitive explanation of Poisson distribution

Question

I've seen the formula most commonly derived as a continuum generalization of a binomial random variable with large $n$, small $p$ and finite $\lambda = np$ yielding

$$ \lim_{n \to \infty} \binom{n}{x} p^x(1-p)^{n-x} = e^{-\lambda}\frac{\lambda ^ x}{x!}$$

It follows, from this derivation, that $$ \lim_{n \to \infty } = (1-p)^{n-x} = e^{-\lambda}$$ yields the probability of failing infinitely many times when the success rate is $\lambda$.

However, from this approach, I could not grok the remaining term

$$\frac { \lambda ^ x } {x!} $$

Question

What insightful derivations (perhaps, from generalizations) of the Poisson random variable exist which leaves an intuition for each of the terms?

My Answer:

My answer, https://math.stackexchange.com/a/2727388/338817 comes from geometric approach to Gamma function intuition (https://math.stackexchange.com/a/1651961/338817) which I quote:

Note that $\frac{t^n}{n!}$ is the volume of the set $S_t=\{(t_1,t_2,\dots,t_n)\in\mathbb R^{n}\mid t_i\geq 0\text{ and } t_1+t_2+\cdots+t_n\leq t\}$

So you specifically want a sort of "combinatorial" interpretation of the expression $e^{-\lambda}\frac{\lambda^n}{n!}$, rather than just any intuitive explanation of the Poisson distribution? — Jack M, Mar 28 '18 at 23:19
@JackM, any precise intuition will help, not necessarily combinatorial — jaslibra, Mar 29 '18 at 03:02
You may find my explanation of the exponential distribution inspiring. The connection to the Poisson is that both the ED and PD are connected to the Poisson process - in the language of the linked answer, the Poisson is the distribution of the number of births in a given interval of time. This is a very different approach to the "limit of binomials" approach, however. — Jack M, Mar 29 '18 at 11:28
You say you want "precise intuition". That seems a bit contradictory; if an explanation is completely precise, then it's a proof (which may or may not be intuitive, depending on the reader). — tparker, Apr 06 '18 at 12:33
@tparker https://terrytao.wordpress.com/career-advice/theres-more-to-mathematics-than-rigour-and-proofs/, I think seek a post-rigorous understanding. — jaslibra, Apr 06 '18 at 20:27
https://blog.kalculate.ai/probability-and-statistics/poisson-distribution
is my take on this question. I spent a lot of time on this page and the linked pages trying to get a visual / combinatorial intuition for what was going on. I ended up turning my solution into a blog post geared toward those without much of a formal math background — David, May 13 '24 at 19:55
Keeling's notes here explain how Exponential random variable and Poisson random variable arise naturally when considering processes which are memoryless in waiting time, and the number of such events occuring in a fixed time interval respectively. — Venkata Karthik Bandaru, Dec 22 '24 at 00:29

jaslibra · Accepted Answer · 2018-04-11T21:52:29.207

Suppose $k$ successes occur in an interval $[0, t)$ and let their times be given by the $k$-tuple $(t_1, \dots, t_k), t_i \leq t$.

The set of events where exactly $n$ successes occur can be measured as $$ \int_0^{t} \int_0^{t - x_1} \cdots \int_0^{ t - \sum_{i = 1}^{n-1} x_i } \int_0^{ t - \sum_{i = 1}^{n} x_i } dx_n dx_{n-1} dx_{n-2} \dots dx_2 dx_1 = \frac{ t^n } { n! }$$

Importantly, the size of the sample space of all events is measured by considering the size of all possible $k$-tuples, $\forall n \geq 0$:

$$ \sum_{k = 0}^{\infty} \frac{ t^k }{ k! } = e^t$$

Taking the ratio of these size of these sets yields the probability that $n$ events occur in the interval $[0, t)$.

$$\boxed{ P \{ X = n \} = e^{-t} \frac{ t^n }{ n! } }$$

Note:

More generally, the event rate can be made non-homogeneous with a scalar function $\lambda(t)$. When the rate is constant for all time, i.e, $\lambda(t) = \lambda$, we write

$$P(X = n) = e^{-\lambda t}\frac{ (\lambda t)^n } { n! }$$

Letting $t = 1$ gives the process on a unit time interval, scaled by $\lambda$. Although we're really interested in $[0, 1)$, it's really as if we're looking at the interval $[0, \lambda)$.

score 8 · Answer 2 · edited Jan 08 '25 at 10:09

8

Since you asked for an intuition, and there are many online derivations of the pdf of the Poisson distribution (e.g. here or here), which already follow a mathematically strict sequence, I'm giving it a shot at looking at it almost as a mnemonic construction.

So the pmf is

$$f_X(x=k)=\frac{\lambda^k\mathrm e^{-\lambda}}{k!}$$

What about thinking of the Poisson parameter $\lambda$ as somewhat reflecting the odds of an event happening in any time period. After all, it is a rate (events/time period), and hence, the higher the rate, the more likely it will be that a certain number of events takes place in a given time period. Further, you already mention how the pdf of the Poisson is derived from the binomial, allowing $n$ to go to infinity; and in the binomial distribution, the expectation is $np,$ equal to $\lambda$ in the Poisson: $p=\frac{\lambda}{n}.$

Notice, for instance, that in the derivation of the pdf of the Poisson, $\left(\frac{\lambda}{n}\right)^k$ is precisely introduced as the $p^k$ (the probability of $k$ successes) in the binomial pmf, $\binom{n}{k}p^k(1-p)^{n-k}.$ The denominator $n^k$ is later eliminated as we calculate the limit $n\to\infty,$ and indeed, $\lambda^k$ is "left over" from this initial probability formula.

Now, in the pdf you have the term raised to the $k$ power, i.e. $\lambda^k$, and it makes intuitive sense, because each occurrence is independent from the preceding and subsequent. So if we are calculating the probability of $k$ iid events happening in a time period, we shouldn't be surprised to end up with $\underbrace{\lambda\cdot\lambda \cdots\lambda}_k=\lambda^k$.

Since these events are indistinguishable from each other, it is not surprising either that we have to prevent over-counting by dividing by the number of permutations of these events, $k!.$ This, in fact is the exact role of the term in the combinations formula of $\binom{n}{k}=\frac{n!}{(n-k)!\color{blue}{k!}}.$

And for the term $e^{-\lambda}$ we could bring into play the inter-arrival time following an exponential distribution: as the rate $\lambda$ increases, the inter-arrival time decreases. We can think of this factor as decreasing the probability of a low $k$ number of events when the rate $\lambda$ factor is high.

edited Jan 08 '25 at 10:09

J.G.

118,053

answered Mar 29 '18 at 00:16

Antoni Parellada

10,239

@Antoni_Parellada, it doesn't make sense to think of $\lambda$ as a probability, because $\lambda \in \mathbb{R}$ and also if $\lambda$ has units events/time, then what units would you through onto $p$, the probability of the event? – jaslibra Mar 29 '18 at 00:58
3

@jaslibra I’m not necessarily defending the analogy (which is great by the way, +1) but here’s something to consider: there’s a difference between $\lambda$ being a literal probability and being a measure/indication of a probability. – gen-ℤ ready to perish Mar 29 '18 at 01:00
2

@jaslibra To consider $\lambda$ as a probability measure is an atrocity (committed by me after much hedging). I believe that was clearly and explicitly stated, not just in the opening remarks, but also in the careful introduction of the idea. I was shooting for an intuition, but if it doesn't help you, I'll delete the answer. – Antoni Parellada Mar 29 '18 at 01:03
@ChaseRyanTaylor, thanks for pointing that out. I'm looking for intuitive, but my intuition should still be precise in understanding exactly what $\lambda ^ k$ should represent. – jaslibra Mar 29 '18 at 01:12
@AntoniParellada You don't need to delete your answer. And an up-vote would be helpful if you think this deserves more attention – jaslibra Mar 29 '18 at 01:16
@jaslibra There is a ton of intuition built into the derivation from the binomial as $n$ tends to infinity. And you may very well be onto something; however, it is also possible that after you've gone through the process of segmenting the time period into infinitely many intervals, etc., etc., and arrive at the final formula, there isn't much room left, other than to accept the result. – Antoni Parellada Mar 29 '18 at 01:18
My idea is that I shouldn't have to derive it from the binomial, the formula should be obvious immediately, just as is the binomial – jaslibra Mar 29 '18 at 01:28
@jaslibra Please take a look at my most recent edit... I just read your last comment... Well... I really don't see it as obvious, although it is rather intuitive when you follow the "story" behind the distribution. But let's wait and see what other answers come through. Best of luck! – Antoni Parellada Mar 29 '18 at 01:30

score 1 · Answer 3 · answered Mar 28 '18 at 21:46

1

You're basically there with your limit. The ratio of two factorials is the product of $x$ numbers from $n-x+1$ to $n$, so for $n\gg x$ the result is approximately $n^x$.

answered Mar 28 '18 at 21:46

J.G.

118,053

1

How does this make intuitive sense in the context of the probability? – jaslibra Mar 28 '18 at 21:49

Venkata Karthik Bandaru · Answer 4 · 2025-01-08T11:02:57.310

[This is from Keeling's notes on radioactive decay here]

From the radioactive decay wiki page:

Radioactive decay is the process by which an unstable atomic nucleus loses energy by radiation.

Radioactive decay is a random process at the level of single atoms. According to quantum theory, it is impossible to predict when a particular atom will decay, regardless of how long the atom has existed. The half-lives of radioactive atoms have a huge range: from nearly instantaneous to far longer than the age of the universe.

Consider radioactive decay of an element, known to be memoryless in waiting time. We have waiting time random variable ${ T \sim \text{Exp}(\lambda) , }$ where the constant ${ \lambda > 0 }$ depends on the nature of the element.

Rare events like earthquakes also have memorylessness in waiting time, and the waiting time can be modelled by an exponential random variable.

Let random variable ${ N(\tau) }$ denote the number of radioactive decays that occur in the time interval ${ [0, \tau]. }$ Let the PMF of ${ N(\tau) }$ be given by

$${ p _{\tau} (k) = \mathbb{P}(N(\tau) = k) \quad \text{ for } k \in \mathbb{Z} _{\geq 0} . }$$

For there to be ${ k }$ events in the time interval ${ [0, \tau] , }$ there must be

${ k - 1 }$ events in an interval ${ [0, s ], }$ ${ s \in (0, \tau) }$ with probability

$${ \mathbb{P}(N(s) = k-1) = p _{s} (k-1) }$$

${ 1 }$ event in an interval ${ [s, s + ds] }$ with probability

$${ {\begin{aligned} &\, \mathbb{P}(N(s + ds) = k \, \vert \, N(s) = k - 1) \\ = &\, \mathbb{P}(T \in [s, s + ds] \, \vert \, T > s) \\ = &\, 1 - e ^{- \lambda ds } \\ = &\, \lambda ds + o(ds) \end{aligned}} }$$

no events in an interval ${ [s + ds, \tau] }$ with probability

$${ {\begin{aligned} &\, \mathbb{P}(N(\tau) = k \, \vert \, N(s + ds) = k) \\ = &\, \mathbb{P}(T > \tau - (s + ds)) \\ = &\, e ^{- \lambda (\tau - (s + ds)) } \\ = &\, e ^{- \lambda (\tau - s)} + o(ds) . \end{aligned}} }$$

Since these events are independent, the probability of all three is the product of the three. Integrating this over all possible intermediate times ${ s }$ we have

$${ {\begin{aligned} p _{\tau} (k) = &\, \int _0 ^{\tau} p _s (k - 1) \, (\lambda ds) \, e ^{- \lambda (\tau - s)} \\ = &\, \lambda e ^{- \lambda \tau } \int _0 ^{\tau} p _s (k - 1) e ^{\lambda s} \, ds . \end{aligned}} }$$

Now differentiating this wrt ${ \tau }$ we have

$${ p _{\tau} ^{'} (k) = - \lambda ^2 e ^{- \lambda t } \int _0 ^{\tau} p _s (k - 1) e ^{\lambda s} \, ds + \lambda e ^{- \lambda \tau } p _{\tau} (k - 1) e ^{\lambda \tau } }$$

that is

$${ \boxed{p _{\tau} ^{'} (k) = - \lambda p _{\tau} (k) + \lambda p _{\tau} (k - 1)} . }$$

For ${ k = 0 }$:

$${ p _{\tau} ^{'} (0) = - \lambda p _{\tau} (0) }$$

that is

$${ p _{\tau} (0) = e ^{- \lambda \tau } . }$$

For ${ k = 1 }$:

$${ {\begin{aligned} p ^{'} _{\tau} (1) = &\, - \lambda p _{\tau} (1) + \lambda p _{\tau} (0) \\ = &\, - \lambda p _{\tau} (1) + \lambda e ^{- \lambda \tau} . \end{aligned}} }$$

Recall the differential equation ${ \frac{dy}{dx} + P(x) y = Q(x) }$ has an integrating factor ${ e ^{\int P(x) \, dx } . }$ Hence

$${ \frac{d}{d\tau} (e ^{\lambda \tau} p _{\tau} (1) ) = \lambda }$$

that is

$${ p _{\tau} (1) = e ^{-\lambda \tau} \lambda \tau . }$$

For ${ k = 2 }$:

$${ {\begin{aligned} p ^{'} _{\tau} (2) = &\, - \lambda p _{\tau} (2) + \lambda p _{\tau} (1) \\ = &\, - \lambda p _{\tau} (2) + \lambda e ^{-\lambda \tau} \lambda \tau \end{aligned}} }$$

that is

$${ \frac{d}{d \tau} (e ^{\lambda \tau } p _{\tau} (2) ) = \lambda ^2 \tau }$$

that is

$${ p _{\tau} (2) = e ^{- \lambda \tau} \frac{(\lambda \tau) ^2}{2} . }$$

Formally by induction

$${ \boxed{ p _{\tau} (k) = e ^{- \lambda \tau} \frac{(\lambda \tau) ^k}{k ! } \quad \text{ for } k \in \mathbb{Z} _{\geq 0} } }$$

as needed.

For ${ \tau = 1 }$ the above probability distribution is called the Poisson distribution with parameter ${ \lambda > 0 . }$

Intuitive explanation of Poisson distribution

Question

My Answer:

4 Answers4

Linked