59

Let $X$ be a random variable with a continuous and strictly increasing c.d.f. $F$ (so that the quantile function $F^{−1}$ is well-defined). Define a new random variable $Y$ by $Y = F(X)$. Show that $Y$ follows a uniform distribution on the interval $[0, 1]$.

My initial thought is that $Y$ is distributed on the interval $[0,1]$ because this is the range of $F$. But how do you show that it is uniform?

  • 3
    This is not true in cases where there's a discrete component. For example, suppose $X=\left.\begin{cases} 1/2 & \text{with probability }1/2, \ W & \text{with probability }1/2,\end{cases}\right}$ and $W$ is uniformly distributed on $[0,1]$, and that the choice between whether $X=1/2$ or not is independent of $W$. Then the cdf of $X$ has no values between $1/4$ and $3/4$, so it cannot be uniformly distributed on $[0,1]$. It is, however, true of continuous distributions. – Michael Hardy Jul 15 '14 at 21:41
  • 5
    see the text of the question. X is continuous! – user162381 Jul 15 '14 at 21:44
  • 11
    By the way, it is not necessary that $F$ is a strictly increasing CDF, continuity is sufficient. Just define the quantile function the usual way as a generalized inverse via $F^-(y)=inf{x\in\mathbb{R}: F(x)\geq y}$. See the proof of Proposition 3.1 in Embrechts, P., Hofert, M.: A note on generalized inverses. Mathematical Methods of Operations Research 77(3), 423-432 link for a very careful and detailed explanation. – binkyhorse Jul 16 '14 at 14:38
  • 2
    Thanks @binkyhorse - that reference is really good. – Math1000 Apr 05 '15 at 22:33
  • @binkyhorse so if $X$ is a continuous random variable and $Y=F_{X}(X)$, then $Y$ must be a $U(0,1)$ random variable? (since continuity of CDF is guaranteed by the fact that $X$ is continuous) – s0ulr3aper07 Feb 23 '19 at 15:04
  • 1
    @s0ulr3aper07 By Proposition 3.1 in the paper I linked above, yes.

    Prop. 3.1: Let F be a distribution function and X ~ F. (a) If F is continuous, F(X)∼U[0,1]. The paper includes a detailed proof.

    – binkyhorse Mar 27 '19 at 20:10
  • Is there an equivalent result for discrete distributions? – Asupollo Apr 24 '19 at 22:16

7 Answers7

61

Let $F_Y(y)$ be the CDF of $Y = F(X)$. Then, for any $y \in [0,1]$ we have:

$F_Y(y) = \Pr[Y \le y] = \Pr[F(X) \le y] = \Pr[X \le F^{-1}(y)] = F(F^{-1}(y)) = y$.

What distribution has this CDF?

JimmyK4542
  • 55,969
11

$$ Prob(Y\leq x)=P(F(X)\leq x)=P(X\leq F^{-1}(x))=x \\ $$ The last equality is from the definition of the quantile function.

Juanito
  • 2,482
  • 12
  • 25
4

Let $y\in(0,1)$. Since $F$ is continuous, there exists $x\in\mathbb{R}$ s.t. $F(x)=y$. Thus, $$ \mathsf{P}(Y\le y)=\mathsf{P}(F(X)\le F(x))=F(x)=y, $$ i.e., $Y\sim\text{U}[0,1]$. In order to see the first equality we don't need continuity. Specifically, since any cdf is right-continuous, \begin{align} \{F(X)\le F(x)\}&=\{\{F(X)\le F(x)\}\cap\{X\le x\}\}\cup \{\{F(X)\le F(x)\}\cap\{X>x\}\} \\ &=\{X\le x\}\cup \{\{F(X)=F(x)\}\cap\{X>x\}\}, \end{align} and $\mathsf{P}(\{F(X)=F(x)\}\cap\{X>x\})=0$.

3

Let $y=g(x)$ be a mapping of the random variable $x$ distributed according to $f(x)$. In the mapping $y=g(x)$ you preserve the condition of probability density (namely you counts the same number of events in the respective bins)

$$ h(y)dy=f(x)dx $$

where h(y) is the probability distribution of $y$

if $h(y)=1$ (uniform distribution) we have

$$ dy=g'(x)dx=f(x)dx $$

This means that $$ g(x)=\int f(x)dx $$

namely the function $g(x)$ that maps the random variable $x$ distributed according $f(x)$, into a random variable $y$ distributed uniformly is his own cumulative distribution function $\int f(x) dx$.

emanuele
  • 369
2

Here is an approach that does not use the quantile function whatsoever - the only property used is that independent copies of $X$ have zero probability of being equal. (The main ingredient in my argument is conditional expectation.)

Consider the cumulative distribution function of $X$, namely $$ F(t)=\mathbb P(X\leq t). $$ Your random variable - which I will suggestively call $U$ instead of $Y$ - can be described by starting with two independent and identically distributed random variables $X,Z$ and considering the conditional probability $$ U=\mathbb P(X\leq Z\mid Z). $$ Then, for all integers $n\geq 1$, we can represent $U^n$ as follows. Let $X_1,X_2,\ldots,X_n,Z$ be independent and identically distributed. By independence, $$ \mathbb P\bigl(X_1\leq Z,X_2\leq Z,\ldots, X_n\leq Z\bigm\vert Z\bigr)=U^n, $$ and thus by the tower property $$ \mathbb EU^n=\mathbb P(X_1\leq Z,X_2\leq Z,\ldots, X_n\leq Z)=\mathbb P\bigl(Z=\max(X_1,X_2,\ldots,X_n,Z)\bigr). $$ Since $X_1,\ldots,X_n,Z$ are iid, each of them is equally likely to be the maximum and therefore $$ \mathbb EU^n=\frac{1}{n+1}. $$ Thus $U$ has the same moments as a uniformly distributed random variable on $[0,1]$. Since $U$ is supported in $[0,1]$ as well, it follows (by the uniqueness of the Hausdorff moment problem) that $U$ is uniformly distributed, as desired.

pre-kidney
  • 30,884
1

For a proof of this problem when $F_X(x)$ is strictly increasing, refer to JimmyK4542's answer. Let's assume $F_X(x)$ is just non-decreasing (there are intervals such as $[a,b]$ where $F_X(x') = c$ for $x'\in[a,b]$). We define $G(y)$ similar to what Henry's comment suggests: $$ G(y)=\inf\{x:F_X(x)\gt y\}$$ Now substituting this expression in what Jimmy has written will give us: $$ F_Y(y) = \Pr[Y \le y] = \Pr[F_X(X) \le y] = \Pr[X \le G(y)] = F_X(G(y))= y \label{eq:I}\tag{I}$$

We need to show that:

  1. $F_X(x)\le y \rightarrow x \le G(y)$
  2. $F_X(G(y))=y$

The second argument is easier to prove. We have the following expression almost according to definitions: $$ F_X(G(y))= F_X(\inf\{x:F_X(x)\gt y\})= y$$ Now for the first argument, we can still use what $G(y)=\inf\{\cdots\}$ implies; if $F_X(x)\le y$, then $x\le \inf\{x:F_X(x)\gt y\};$ hence $x\le G(y)$.

With the two arguments proved and a substitution in \ref{eq:I}, we have proved the main argument.

Mahyar
  • 65
  • 1
    I personally believe this problem is a severe case for abuse of notation, and a bad professor's problem. – Mahyar Feb 19 '22 at 01:46
  • The first condition you claimed that needs to be shown is insufficient. You need to show implication in both directions - which is also true. @mahyar – G.Bar Aug 31 '24 at 08:40
1

Let $a:= \max\{b|F(b) = y\}$

F is continuous, so pre-image of y is non-empty and closed, so its $\max$ exists. Also we get F(a) = y.

Note:

  • F is non-decreasing, so $X\le a\Rightarrow F(X)\le F(a)$.

  • But since $a$ is $\max$ of pre-image of F(a), we have $F(X)\le F(a)\Rightarrow X\le a$.

  • Thus, $X\le a\Leftrightarrow F(X)\le F(a)$

So: $\mathbb P(Y\le y)=\mathbb P(F(X)\le y)=\mathbb P(F(X)\le F(a)) = \mathbb P(X\le a)=F(a)=y$

*that is for y in (0, 1). For 0 and 1 it is easy to investigate separately. (It's a separate problem since preimage for 0 and 1 can be empty for example for standard normal c.d.f. F).