34

Many results are based on the fact of the Moment Generating Function (MGF) Uniqueness Theorem, that says:

If $X$ and $Y$ are two random variables and equality holds for their MGF's: $m_X(t) = m_Y(t)$ then $X$ and $Y$ have the same probability distribution: $F_X(x) = F_Y(y)$.

The proof of this theorem is never shown in textbooks, and I cannot seem to find it online or in any book I have access to.

Can someone show me the proof or tell me where to look it up?

Thanks for your time.

RobPratt
  • 50,938
Shuzheng
  • 5,821

4 Answers4

38

Let us first clarify the assumption. Denote the moment generating function of $X$ by $M_X(t)=Ee^{tX}$.

Uniqueness Theorem. If there exists $\delta>0$ such that $M_X(t) = M_Y(t) < \infty$ for all $t \in (-\delta,\delta)$, then $F_X(t) = F_Y(t)$ for all $t \in \mathbb{R}$.

To prove that the moment generating function determines the distribution, there are at least two approaches:

  • To show that finiteness of $M_X$ on $(-\delta,\delta)$ implies that the moments $X$ do not increase too fast, so that $F_X$ is determined by $(EX^k)_{k\in\mathbb{N}}$, which are in turn determined by $M_X$. This proof can be found in Section 30 of Billingsley, P. Probability and Measure.

  • To show that $M_X$ is analytic and can be extended to $(-\delta,\delta)\times i\mathbb{R} \subseteq \mathbb{C}$, so that $M_X(z)=Ee^{zX}$, so in particular $M_X(it)=\varphi_X(t)$ for all $t\in\mathbb{R}$, and then use the fact that $\varphi_X$ determines $F_X$. For this approach, see Curtiss, J. H. Ann. Math. Statistics 13:430-433 and references therein or Roja's answer.

At undergraduate level, it is interesting to work with the moment generating function and state the above theorem without proving it. One possible proof requires familiarity with holomorphic functions and the Identity Theorem from complex analysis, which restricts the set of students to which it can be taught.

In fact, the proof is so advanced that, at such a point it usually makes more sense to accept working with complex numbers, forget about moment generating function and work with the charachteristic function $\varphi_X(t)=Ee^{itX}$ instead. Almost every graduate textbook takes this path and proves that the characteristic function determines the distribution as a corollary of the inversion formula.

This proof of the inversion formula is bit long, but it only requires Fubini Theorem to switch an expectation with an integral and Dominated Convergence Theorem to switch an integral with a limit. A direct proof of uniqueness without inversion formula is shorter and simpler, and it only requires Weierstrass Theorem to approximate a continuous function by a trigonometric polynomial.

Side remark. If you only admit random variables whose support are contained in $\mathbb{Z}_+$, then the probability generating function $G_X(z)=Ez^X$ determines $p_X$ (and thus $F_X$). This elementary result is proved in most undergraduate textbooks and is mentioned in Did's answer. If you only admit random variables whose support are contained in $\mathbb{Z}$, then it is simpler to show that $\varphi_X$ determines $p_X$, as also mentioned in Did's answer, and the proof uses Fubini.

user334639
  • 1,676
9

The following is a proof with two steps. It requires familiarity with Cauchy-Riemann equations and the identity theorem from complex analysis. Dominated convergence theorem is also applied. Thus, it fits for any advanced undergraduate or graduate student who is familiar with these topics.

Step 1:

Assume that $X$ is a random variable such that there exists $\delta\in(0,\infty]$ for which

$$M_X(s)=Ee^{sX}<\infty\ , \ \forall s\in(-\delta,\delta)\,.$$

Now, let's denote $\Omega\equiv\left\{z\in\mathbb{C};\text{Re}(z)\in(-\delta,\delta)\right\}$ and define

$$\phi_X(z)\equiv Ee^{zX}\ ,\ \forall z\in\Omega\,.$$

Let $z=s+it\in\Omega$ and observe that

$$\phi_X(z)=E\ \overset{u(s,t;X)}{\overbrace{e^{sX}\cos (tX)}}+iE\ \overset{v(s,t;X)}{\overbrace{e^{sX}\sin (tX)}}\,.$$

It is possible to apply dominated convergence theorem in order to show that $Eu(\cdot;X)$ and $Ev(\cdot;X)$ are differentiable on $(-\delta,\delta)\times\mathbb{R}$. To see this, notice that for every $(s,t)\in(-\delta,\delta)\times\mathbb{R}$

$$|\frac{\partial u(s,t;X)}{\partial t}|\ ,\ |\frac{\partial u(s,t;X)}{\partial s}|\ ,\ |\frac{\partial v(s,t;X)}{\partial t}|\ ,\ |\frac{\partial u(s,t;X)}{\partial s}|$$ are all dominated by the random variable $Y=|X|e^{sX}$. In addition notice that for every $\epsilon>0$

$$|X|=\epsilon^{-1}|\epsilon X|\leq\epsilon^{-1}\sum_{k=0}^\infty\frac{|\epsilon X|^k}{k!}$$

$$=\epsilon^{-1}e^{\epsilon |X|}\leq\epsilon^{-1}\left(e^{-\epsilon X}+e^{\epsilon X}\right)$$ and hence

$$|Y|\leq \frac{e^{sX}}{\epsilon}\left(e^{-\epsilon X}+e^{\epsilon X}\right)=\epsilon^{-1}\left(e^{(s-\epsilon) X}+e^{(s+\epsilon)X}\right)\,.$$

Therefore, since $M_X(\cdot)$ is finite on $(-\delta,\delta)$, then by taking small enough $\epsilon>0$ deduce that

$$E|Y|\leq\epsilon^{-1}\left[M_X(s-\epsilon)+M_X(s+\epsilon)\right]<\infty\,.$$

Thus dominated convergence theorem can be carried out to show that $Eu(\cdot;X)$ and $Ev(\cdot;X)$ have partial derivatives with respect to both of their coordinates. Moreover, these partial derivatives are obtained by differentiating inside the expectation. In addition, dominated convergence theorem can be applied once more (with the same dominating random variable) in order to show that these partial derivatives are continuous on $(-\delta,\delta)\times\mathbb{R}$, i.e., $Eu(\cdot;X)$ and $Ev(\cdot;X)$ are differentiable on $(-\delta,\delta)\times\mathbb{R}$. Now, it is straightforward to verify that the Cauchy-Riemann equations are satisfied on $(-\delta,\delta)\times\mathbb{R}$ and hence $\phi_X(\cdot)$ is holomorphic on $\Omega$.

Step 2: Assume that $X$ and $Y$ are two random variables such that there exists $\delta\in(0,\infty)$ for which $M_X(s)=M_Y(s)<\infty$ for every $s\in(-\delta,\delta)$. Then, define

$$H(z)\equiv\phi_X(z)-\phi_Y(z)\ , \ \forall z\in\Omega$$

and by the result of Step 1, $H(\cdot)$ is holomorphic on $\Omega$ which is an open connected subset of $\mathbb{C}$. Now, denote a sequence $z_n=\frac{\delta}{2n}\in\Omega,\forall n\in\mathbb{N}$ and note that $z_n\rightarrow 0\in\Omega$ as $n\to\infty$. In addition, $H(z_n)=0,\forall n\in\mathbb{N}$ and hence using the identity theorem deduce that $H(z)=0,\forall z\in\Omega$. Then, as a special case, for every $t\in\mathbb{R}$ set $z=it$ and get

$$Ee^{itX}=Ee^{itY}$$

which means that $F_X(u)=F_Y(u),\forall u\in\mathbb{R}$ due to the uniqueness property of characteristic function (There are plenty of sources for the proof of this property and as mentioned by others its proof requires no advanced staff but dominated convergence and Fubini's theorems).

Roja
  • 221
  • Why the sequence $z_n=\delta/2n$. Can we take another sequence thats not zero as $n$ goes to infinity? – Samantha Oct 06 '20 at 01:04
  • 1
    You could choose any sequence $(z_n){n=1}^\infty$ for which (1) $z_n\in\Omega,\forall n\in\mathbb{N}$, (2) $H(z_n)=0,\forall n\in\mathbb{N}$ and (3) $\exists\lim{n\to\infty}z_n\in\Omega$. For more accurate details you may see the identity theorem for holomorphic functions, e.g., https://en.wikipedia.org/wiki/Identity_theorem. – Roja Oct 06 '20 at 15:49
8

$$(\forall n\geqslant0)\qquad \left.\frac{\mathrm d^n}{\mathrm ds^n}\mathbb E[s^X]\right|_{s=0}=n!\cdot\mathbb P[X=n] $$ $$(\forall x\in\mathbb R)\qquad \int_0^{2\pi}\mathbb E[\mathrm e^{\mathrm itX}]\,\mathrm e^{-\mathrm itx}\,\mathrm dt=2\pi\cdot\mathbb P[X=x] $$

Did
  • 284,245
  • "Can you explain what you are doing" ?? – Did Aug 03 '13 at 19:44
  • 2
    I guess Did is giving a formula for recovering a discrete probability distribution from its MGF. – Amritanshu Prasad Aug 04 '13 at 04:31
  • Yes, but I am having trouble interpreting what 's' means in this connection. – Shuzheng Aug 04 '13 at 08:34
  • "what 's' means in this connection"... Answer: 's' means nothing, one can consider that 's' is any real number in [0,1]. – Did Aug 04 '13 at 09:48
  • Lets say X is a random variable with P(X = j) = p and P(X = k) = 1 - p.

    Then M_X(t) = E[e^(tX)] = e^(t * j) * p + e^(t * k) * (1 - p)

    How does your formula make sense in this connection ? Can you give an example :-) ?

    – Shuzheng Aug 05 '13 at 10:06
  • 1
    Then $E[s^X]=ps^j+(1-p)s^k$. Alternatively, in the general case, $E[s^X]=M_X(\log s)$. – Did Aug 05 '13 at 10:21
  • Could you write this in terms of e^tX instead of s? – Shuzheng Aug 06 '13 at 19:14
  • 1
    Does your formula works for negative values of X ? What is X is -16 ? – Shuzheng Aug 06 '13 at 19:22
  • Try http://en.wikipedia.org/wiki/Laplace_Transform. – Did Aug 06 '13 at 19:32
  • "Why have you written two equations ? One for natural numbers and one for real numbers ?" The second one yields the atoms of X at every value, not only integer ones (as is written explicitely in the answer). "What is s in this connection - a real number ?" Already answered 8 months ago. Do you read the comments answering your queries for explanations? – Did Apr 12 '14 at 18:38
  • 6
    Let me add that it is a tad unsettling to see that, when it is suggested that you perform yourself a trivial computation which would allow you to understand the general case, you do not even acknowledge the suggestion but fall back on demands already answered a long time ago. If you think maths can be learned as a spectator, you are wrong (but hey, this is your call). – Did Apr 12 '14 at 18:41
  • 1
    "If you think maths can be learned as a spectator, you are wrong " well put – Math1000 Nov 01 '15 at 11:54
  • 2
    The first formula only makes sense if the support of $X$ is $\mathbb{N}$. The second formula is only true if the support of $X$ is contained in $\mathbb{Z}$, so it makes no sense to write "$\forall x \in \mathbb{R}$". See that for a normal distribution, $\mathbb{E}[e^{itX}]=e^{-t^2/2}$ and this formula assings a complex number to $\mathbb{P}(X=0)$, instead of zero. It does not come close to answering the OP's question: "If $X$ and $Y$ are two random variables and equality holds for their MGF's: $m_X(t) = m_Y(t)$ then $X$ and $Y$ have the same probability distribution: $F_X(x) = F_Y(y)$." – user334639 Aug 04 '17 at 17:13
  • Again? What a passion... – Did Aug 04 '17 at 21:12
  • @user334639, what is your source for the info on "if support is N" and "contained in Z"? – Erdogan CEVHER Jun 19 '18 at 18:35
  • 1
    @Erdogan CEVHER for the second formula de Normal is a counter-example as already pointed out. For the first formula take X=1/2 a.s. to see the equation fall apart. – user334639 Jul 22 '18 at 13:01
6

In the case where $X$ has density function $\phi(x)$, $$ M_X(it) = E(e^{itX}) = \int_{-\infty}^\infty e^{itx}\phi(x)dx, $$ which is the Fourier transform of $\phi(x)$. Therefore $\phi(x)$ can be recovered from its MGF using the Fourier inversion formula.

The function $M_X(it)$ is called the characteristic function of $X$. See Chapter 6 of Kai Lai Chung's book A Course in Probability Theory for more details.

  • Thanks alot. So in order to understand the proof I must look into Fourier transformation theory ? – Shuzheng Aug 03 '13 at 12:19
  • 1
    I guess Did's answer above works nicely for random variables on a finite probability space. – Amritanshu Prasad Aug 05 '13 at 10:40
  • 1
    @AmritanshuPrasad Actually, for every nonnegative integer valued random variable. – Did Aug 06 '13 at 19:27
  • @Did Actually, you prove uniqueness restricted to that space. You don't rule out the possibility that other random variables happen to have the same MGF. – user334639 Jul 28 '17 at 03:45