4

Postscript to the question below. In trying to learn from the answers below, all of which I am grateful for, I read a historical article on the origins and legacy of Kolmogorov's Grundbegriffe. This article helped me understand what basic things people were struggling with when this theory was developed. In particular, the long-term trend towards abstraction and foundation in terms of measure theory, and the early-days focus on the connection between the real world and the probabilistic model. I then re-read the answers and comments. I made a comment that started

We can choose $Ω=\Re$ because the domain of the distribution function is $\Re$.

This is wrong because the domain of the distribution function is not necessarily mentioned in the declaration of the probability space. I made the convention that random variables $X: \Omega \rightarrow \Re$. So the domain of the distribution function is $\Re$ by my convention, but that doesn't have anything to do with the probability space. $\Omega$ is a kind of index set. Suppose we are reasoning about the saturation of the color red in grapes. In that case we are thinking about say a color level in $S=[0,255)$. Nowhere in the definition of a probability space $(\Omega,\mathcal A,P)$ to support reasoning about $S$ do we need to specify $S$. We do need to demonstrate that there is a 1-1 mapping between $\Omega$ and $S$, i.e. that $\Omega$ can enumerate $S$. Once we have "built" $(\Omega,\mathcal A,P)$, we can put it to work and re-use it for any $S$ which $\Omega$ can enumerate. The probability space $(\Omega,\mathcal A,P)$ is a kind of indexing structure. That for me is the key realization. The key cognitive error comes from labelling $\Omega$ as the sample space, and $\mathcal A$ as the event space. The common sense meaning of those terms implies a connection with the actual samples being reasoned about, when that does not have to be the case. A far less misleading terminology would be to label $\Omega$ as the sample index space or just index space, and $\mathcal A$ as the index set space. This kind of thing is clearly understood in programming languages, where if I have an array $A$, then $(i,j)$ is an index and I don't confuse $(i,j)$ with $A[i,j]$, and I don't confuse the purpose of arrays with the purpose of array indices, but in some contexts I can identify $A[i,j]$ with $(i,j)$.

Short version of the question: How do we formally and correctly define the probability space of the reals which supports the definition of the typical/usual univariate continuous probability distributions, such as uniform and exponential?

Short restatement of the core question that I have: I am hung up on p. 3 section 1.1B of the KPS text. They start with an unspecified probability space $(\Omega,\mathcal A,P)$. Two distinct random variables $V$, $V \in Exp(\lambda)$ and $V \in U[a,b]$, are said to have distribution functions $F_V=P_V((-\infty,x))=P(\{\omega \in \Omega: V(\omega)<x\})$. These are distinct and solved separately as $F_{U[a,b]}(x) = \mathcal H(x-a) \mathcal H(b-x) \frac{x-a}{x-b} + \mathcal H(x-b)$ and $F_{Exp(\lambda)}=\mathcal H(x) (1-e^{-\lambda x})$, where $\mathcal H(x \geq 0) = 1, \mathcal H(x<0)=0$. My key question is:

  • What is a solution for the $P$ shared by $X$ and $Y$?

Note: Here are some similar questions on Math Stack Exchange

Comment: I was mistakenly assuming that the text above was taking $\Omega=\Re$ because I saw a similar statement somewhere to the effect of saying "for purposes of discussion let's say the sample space for continuous random variables is $\Re^d$". The cited answer to 2nd question above starts that way but then gets to $[0,1]$. So: I now understand that the $[0,1]$ is the "best fit" sample space, along with Lebesgue measure. So the "right" probability space that I was looking for is the Steinhaus space $([0,1],\mathscr B([0,1]), \mu)$ where $\mu$ is the Lebesgue measure restricted to $[0,1]$. 99.999% of my confusion came from

  • Not recognizing that $[0,1]$ is a "big enough" space to enumerate the domain of a continuous map into $\Re$. So it's "as good as" $\Re$.
  • Making the assumption that the convention, was, somehow somewhere, to identify the sample space for $d$-dimensional continuous random variables with $\Re^d$, when the "best fit" answer is $[0,1]^d$.

Longer version of the question:

Following this text,

Let $\Omega$ be a nonempty set, the sample space.

Let set $\mathcal F$ of subsets of $\Omega$ be a $\sigma$-algebra so that

  • $\Omega \in \mathcal F$
  • $\Omega \setminus F \in \mathcal F$ if $F \in \mathcal F$
  • $\bigcup_{n=1}^{\infty} F_n \in \mathcal F$ if all $F_i \in \mathcal F$

Let $P: \mathcal F \rightarrow [0,1]$ be a probability measure so that

  • $P(\Omega) = 1$
  • $P(\Omega \setminus F) = 1-P(F)$
  • $P(\bigcup_{n=1}^{\infty} F_n) = \sum_{n=1}^\infty P(F_n)$

We call the triple $(\Omega, \mathcal F, P)$ a probability space.

Suppose $X:\Omega\rightarrow \Re$. We say $X$ is a random variables if $\{\omega \in \Omega : X(\omega) \leq a\}$ is in $\mathcal F$ for every $a \in \Re$.

Then the probability distribution function $F_X : \Re \rightarrow \Re$ is defined for all $x \in \Re$ as

$$F_X(x) = P(\{\omega \in \Omega : X(\omega) < x\})$$

Note that $P$ appears unsubscripted in the definition of $F_X$. $P$ does not depend on the particular random variable $X$ whose distribution we are defining. So in that sense it should be possible for the same probability space $(\Omega, \mathcal F, P)$ to underly probability distribution function constructions for multiple distinct random variables $X$ and $Y$, $X \neq Y$, for the same probability space.

For example, let

$$\Omega = \{0,1\}$$ $$\mathcal F = \{\emptyset, \{0\}, \{1\}, \{0,1\}\}$$ $$P = \begin{cases} \emptyset &\mapsto& 0 \\ \{0\} &\mapsto& \frac{1}{2} \\ \{1\} &\mapsto& \frac{1}{2} \\ \{0,1\} &\mapsto& 1 \end{cases}$$

Let $X,Y: \Omega\rightarrow \Re$ and be random variables fully defined by

$$X = \begin{cases} 0 &\mapsto& 17 \\ 1 &\mapsto& 17 \end{cases}$$

$$Y = \begin{cases} 0 &\mapsto& 42 \\ 1 &\mapsto& 42 \end{cases}$$

Then the probability distributions of $X$ and $Y$ are

$$F_X(x) = P(\{\omega\in\Omega:X(\omega)<x\}) = \begin{cases} x < 17 &\mapsto& 0 \\ x \geq 17 &\mapsto& 1 \end{cases}$$

$$F_Y(x) = P(\{\omega\in\Omega:Y(\omega)<x\}) = \begin{cases} x < 42 &\mapsto& 0 \\ x \geq 42 &\mapsto& 1 \end{cases}$$

Clearly $X \neq Y$ and $F_X \neq F_Y$. In the above discrete example, if I understand the language correctly, there is a single probability space $(\Omega,\mathcal F,P)$ with a single probability measure $P$ which underlies or supports two distinct probability distributions $F_X$ and $F_Y$ for two distinct random variables $X$ and $Y$.

Now let $(\Omega, \mathcal F, P)$ be a probability space underlying random variables $X$ and $Y$ where:

  • Random variable $X: \Omega \rightarrow \Re$ is such that $X$ has the uniform distribution $F_X: \Re \rightarrow [0,1]$ such that

$$F_X(x) = P(\{\omega\in\Omega:X(\omega)<x\}) = \begin{cases}0 &:& x < a \\ \frac{x-a}{b-a} &:& a \leq x \leq b \\ 1 &:& b < x \end{cases}$$

  • Random variable $Y: \Omega \rightarrow \Re$ is such that $Y$ has the exponential distribution $F_Y: \Re \rightarrow [0,1]$ such that

$$F_Y(x) = P(\{\omega\in\Omega:Y(\omega)<x\}) = \begin{cases}0 &:& x < 0 \\ 1-e^{-\lambda x} &:& x \geq 0 \end{cases}$$

Also, per comment below, one distribution can be supported by multiple probability spaces. (The key understanding here for me is that probability space and probability distribution are separate constructions.)

My questions are (and some answers that I take from my reading of the solutions below):

Q1. Is $(\Omega, \mathcal F, P) = (\Re, \mathcal B(\Re), \mu)$ where $\mathcal B(\Re)$ is the Borel set of the reals and $\mu$ is the Lebesgue measure a probability space which underlies $X$ and $Y$? Answer: No, but the Steinhaus $([0,1], \mathcal B([0,1]), \mu)$ is good.

Q2. Is it correct to call $(\Re, \mathcal B(\Re), \mu)$ the standard probability space of the reals? Is there some other standard notation or language for the probability space underlying the usual continuous probability distributions? Answer: No, but the Steinhaus space is a standard space in the Wikipedia sense.

Q3. Is it correct to say that the notion of probability space is independent of and complementary to the notion of probability distribution, and that the notion of probability distribution is always associated with a particular random variable $X$ presented with a supporting probability space $(\Omega, \mathcal F, P)$? Answer: Kind of. One distribution can be accompanied by many probability spaces. One probability space can be accompanied by many distributions. I'm using "accompanied" because the worked "supported" may be overloaded in math. I'm looking for some compact synonym of "independent and complementary". The main thing is to demonstrate through examples that the relationship is many-to-many.

  • 2
    There can be all kinds or probability spaces for a given distribution. There is no such thing as the probability space underlying $X$. – Kavi Rama Murthy Jul 17 '20 at 23:20
  • 1
    In your first bullet, why do you say that $X:\mathbb R \to \mathbb R$? The domain of the random variable $X$ is $\Omega$, the sample space. Same question for $Y$ in bullet 2. – littleO Jul 19 '20 at 17:23
  • I'm talking about the usual continuous univariate distributions that you would see in SciPy.stats, for example. The basis of the question is that I am trying to think of a "category" or "signature" i.e. a framework that describes what functions come with all such distributions like pdf, cdf, ppf etc. However when I study "probability space" proper versus "probability distribution" I am finding these are really two topics. I will revise though to Omega. – Lars Ericson Jul 19 '20 at 17:34
  • Trying to actually think about this in category theory is way over my head: https://ncatlab.org/nlab/show/probability+theory. – Lars Ericson Jul 19 '20 at 17:40
  • 3
    The underlying probability space is typically regarded as irrelevant. Any facts which depend on the underlying space tend to be regarded as “non-probabilistic” in nature. – guy Jul 19 '20 at 18:34
  • @guy, please see my comment to tomasz below about the gap I am trying to fill. – Lars Ericson Jul 19 '20 at 19:47
  • the probability space can be arbitrary. In real applications related to physics or engineering it generally is a subset of an Euclidean space. This is the reason why is far more important the probability distribution of a random variable than it domain. –  Jul 19 '20 at 22:06
  • To give a programming analogy, the probability space is like hardware, random variables are like software, and probability distributions are like APIs. That is, the probability space a random variable is defined on is an implementation detail and it is usually not necessary to know exactly what it is, because we just work with the probability distributions. – Zhen Lin Jul 20 '20 at 05:11
  • It may be "usually not necessary to know", but I asked the question. I wish people would answer the question I'm asking or just ignore the question and move on. It is unresponsive to tell me why it is uninteresting or best left unsaid, or to give a a brief, abstract answer that implicitly demonstrates to other people with the same training that they know the answer without actually answering the question explicitly. It is a specific, simple, and really very concrete question to just construct a probability space for the usual continuous univariate real probability distributions. – Lars Ericson Jul 20 '20 at 14:48
  • The answer of tomasz addresses your questions. For example, if you have a continuous strictly increasing function $\mathbb{R} \to [0, 1]$ then the inverse of that function will be a random variable defined on $[0, 1]$ regarded as a probability space and whose distribution is the function you started with. Taking the product of many copies of the unit interval then lets you define as many independent random variables. Anyway I refer you to this blogpost, particularly the remarks on extension. – Zhen Lin Jul 20 '20 at 16:00
  • Thank you @Zhen Lin, I will read Terry Tao's notes. – Lars Ericson Jul 21 '20 at 13:18
  • https://math.stackexchange.com/questions/3428516/in-probabilistic-questions-with-real-life-context-why-can-we-ignore-defining/3428956#3428956 –  Jul 23 '20 at 04:24
  • @LarsEricson: One can do most of theoretical probability with $(0,1)$ , ${0,1}^\mathbb{N}$ or $\mathbb{R}$ as the $\Omega$. the measures would then be $\lambda$ (Lebesgue measure), the product $1/2$ measure, or for $\frac{1}{\sqrt{2\pi}}e^{\frac{-x^2}{2}}$ as probability measures respectively. In my answer I showed to you who to generate most things starting with the unit interval $(0,1)$. By now there are many good textbooks where you can learn a descent amount of fundamentals. I recommend to start there along with the historical papers. you'll move faster that way. – Mittens Jul 26 '20 at 17:44
  • What are your favorite textbooks on this topic? – Lars Ericson Jul 26 '20 at 18:13
  • I have many favorites, but I think that one that can answer most of your questions is Oliver Knill's notes, the introduction. http://abel.math.harvard.edu/~knill/teaching/math144_1994/probability.pdf – Mittens Jul 26 '20 at 19:04

5 Answers5

4

Regarding your first question, I am assuming you meant to use the space $[0,1]$ rather than the whole set of reals (otherwise, it would not be a probability space). Besides that, or the most part, it does not matter. More precisely, given any real-valued random variable $X$, you can find a random variable $X'\colon [0,1]\to \mathbf R$ with the same distribution.

The same is true for random variables with values in any standard Lebesgue space, and in particular, any separable metric space. This implies that given any sequence $(X_n)_n$ of random variables $\Omega\to \mathbf R$, you can find a sequence $(X_n')_n$ of random variables $[0,1]\to \mathbf R$ with the same joint distribution.

On the other hand, it is not hard to see that there is no sequence $(X_\alpha)_{\alpha<\mathfrak c^+}$ of nontrivially i.i.d. random variables $[0,1]\to \mathbf R$. It should probably not be too hard to argue that there is no such uncountable sequence, even much shorter than $\mathfrak c^+$. So restricting the domain of the random variables does restrict the things we can see.

Since the structure of the domain (as opposed to the join distribution of variables) is usually mostly immaterial in probability theory, it is usually more convenient to leave the domain unspecified and implicit.

Regarding your second question, if there is a "the" standard probability space, then it would either be $[0,1]$ with the Lebesgue measure or $\{0,1\}^{\mathbf N}$ with the usual Haar/coin toss measure. Still, usually, you would speak of "a" standard probability space.

I'm not sure whether I understand your third question. The basic notion is that of a measurable space. Using this, we can define the notion of a measurable function (= random variable), a probability space (= a measurable space with a probability measure), and using those two, we can define the probability distribution (= the pushforward of the probability measure via the random variable). So I would not call these notions independent.

tomasz
  • 37,896
  • The construction of a random variable $X'\colon[0,1]\longrightarrow\mathbf R$ having the same distribution as $X$ is called the "Skorokhod representation" of $X$ as far as I know. – Alex Ortiz Jul 19 '20 at 19:04
  • @tomasz, I am reading page 2 section 1 of Kloeden/Platen/Schurz "Numerical Soln of SDE thru Computer Experiments". They start out with the prob space $(\Omega,\mathcal A,P)$, and define prob measure, and then define random variable and then prob distribution and give uniform and exponential distribution examples on univariate reals. They do not go back to construct the probability space $(\Omega,\mathcal A,P)$ corresponding to the example. I am aware it is typical not to specify the space. I find it frustrating, so I'm trying to fill this little gap between page 2 and 3. – Lars Ericson Jul 19 '20 at 19:36
  • @LarsEricson: I don't really know what gap you have in mind. What is frustrating you, exactly? Not knowing what is the underlying probability space? As I have said, in both cases, you can assume it's simply $[0,1]$ if you really want to, but in the end, it does not matter. – tomasz Jul 19 '20 at 19:47
  • @tomasz I have two PDFs, uniform and exponential, both defined on the reals. What is the probability space of the uniform PDF? What is the probability space of the exponential PDF? Are they the same or different? It is a simple question. Pedagogically, there is no reason to elaborate the definition of a probability space if you're not going to use it. In this case, they defined it, gave example distributions in context, but then left out stating the space. There is no "Borel set" anywhere in the book. – Lars Ericson Jul 19 '20 at 19:53
  • @LarsEricson: I don't really know the book, so I can't speak for it, but judging by the title, it does not seem to be at all focused on the technical aspects of probability theory. If you want to learn more about it, you should probably pick up a measure theory/probability theory book. – tomasz Jul 19 '20 at 19:58
  • @LarsEricson: Re: PDFs --- you seem to be confusing PDFs and random variables. A PDF is just an integrable function (in this case, in the reals) such that the integral gives you the distribution you are trying to describe, and then you can look at random variables having this specific distribution, on some measure space. – tomasz Jul 19 '20 at 20:01
  • @tomasz By "PDF" I mean $\frac{d F_X}{dx}$ where $F_X$ is defined above. By "random variable" I mean a map $X:\Omega\rightarrow\Re$ as defined above. The pages of the book are here: https://www.google.com/books/edition/Numerical_Solution_of_SDE_Through_Comput/DOIRBwAAQBAJ?hl=en&gbpv=1&dq=kloeden+platen+schruz+probability+space&pg=PA1&printsec=frontcover – Lars Ericson Jul 19 '20 at 20:03
  • @LarsEricson: Again, it's better if you just stop worrying so much. There is an underlying space. It does not matter what it is, beyond the fact that it is a probability space. – tomasz Jul 19 '20 at 20:10
  • @tomasz, these people: http://www.math.uchicago.edu/~may/VIGRE/VIGRE2010/REUPapers/Lynn.pdf , https://arxiv.org/abs/1406.6030, https://golem.ph.utexas.edu/category/2018/09/a_categorical_look_at_random_v.html all spent a lot of time trying to be clear about what it means to be a probability space. I'm trying to connect that to the simple case of uniform and exponential distributions on the reals. If it turns out to be impossible to simply state the probability space of these distributions, I find that interesting. You don't. I think that's OK. – Lars Ericson Jul 19 '20 at 20:20
  • 2
    @LarsEricson: It is not impossible in any meaningful sense. It's just that there is no one choice of the space. But that should not be surprising. – tomasz Jul 19 '20 at 20:30
3

In applications of Probability theory, the probabilistic space is seldom specified, it sits there in the background; however, at least conceptually, one may still what key characteristics the underlying space are based on the kinds of things we are observing, and the kinds of things we want to measure.

For theoretical purposes, one often needs to have a precise description of the underlying probability space in order to use known results, verify conditions, or further advance the theory (new theorems, concepts, etc).

It turns out that most theoretical results can be obtained by considering the Steinhaus space $$((0,1),\mathscr{B}(0,1),\lambda)$$ where $\mathscr{B}(0,1)$ is the Borel $\sigma$-algebra in $(0,1)$, and $\lambda$ is the Lebesgue measure (length measure) restricted to the interval $(0,1)$, as the underlying probability space ( a canonical probabilty space of sorts). By that I mean that one can explicitly generate random samples with values any prescribed distribution, as well as represent conditional expectation by randomization (generation of uniform distributions).

The problem of existence an generation of stochastic processes is a more subtle problem; however, one may use copies of $((0,1),\mathscr{B}(0,1))$ with a consistent prescription of finite dimensional distributions to explicitly define a stochastic process on the product of copies of $((0,1),\mathscr{B}(0,1)$ with the prescribed finite dimensional distributions.

Here is an attempt to give a an overview of all this.


  1. Generation of i.i.d. Bernoulli random variables (tossing a fair coin):

First notice that in the Steinhause space, the function $\theta(x)=x$ is obviously uniformly distributed $U[0,1]$, that is $\lambda[\theta\leq x] =x$, for all $0<x<1$.

Recall that every $x\in[0,1]$ has a unique binary expansion $$x=\sum_{n\geq1}r_n/2^n$$ where $r_n\in\{0,1\}$, and $\sum_{n\geq1}r_n=\infty$ for $x>0$. For each $n\in\mathbb{N}$, the $n$--th bit map $x\mapsto r_n(x)$ defines a measurable function from $([0,1],\mathscr{B}([0,1]))$ to $(\{0,1\},2^{\{0,1\}}))$, where $2^{\{0,1\}}$ is the collection of all subsets of $\{0,1\}$.

Therefore, the map $\beta:[0,1]\rightarrow\{0,1\}^{\mathbb{N}}$ given by $x\mapsto(r_n(x))$ is measurable.

The next result is a mathematical formulation of tossing a fair coin.

Lemma 1: Suppose $\theta\sim U[0,1]$, and let $\{X_n=r_n\circ\theta\}$ its binary expansion. Then, $\{X_n\}$ is an i.i.d. Bernoulli sequence with rate $p=\tfrac12$. Conversely, if $(X_n)$ is an i.i.d. Bernoulli sequence with rate $p=\tfrac12$, then $\theta=\sum_{n\geq1}2^{-n}X_n\sim U[0,1]$.

Here is a short proof:

Suppose that $\theta\sim U(0,1)$. For any $N\in\mathbb{N}$ and $k_1,\ldots,k_N\in\{0,1\}$, $$\begin{align} \bigcap^N_{j=1}\{x\in(0,1]:r_j(x)=k_j\}&=&(\sum^N_{j=1}\tfrac{k_j}{2^j}, \sum^N_{j=1}\tfrac{k_j}{2^j}+\tfrac{1}{2^N}]\\ \{x\in(0,1]: r_N(x)=0\}&=&\bigcup^{2^{N-1}-1}_{j=0}(\tfrac{2j}{2^N},\tfrac{2j+1}{2^N}]\\ \{x\in(0,1]:r_N(x)=1\}&=&\bigcup^{2^{N-1}-1}_{j=0} (\tfrac{2j+1}{2^N},\tfrac{2(j+1)}{2^N}] \end{align} $$ It follows immediately that $ \mathbb{P}[\bigcap^N_{j=1}\{X_j=k_j\}]=\tfrac{1}{2^N}=\prod^N_{j=1}\mathbb{P}[X_j=k_j]$. Hence $\{X_n\}$ is a Bernoulli sequence with rate $\tfrac12$.

Conversely, suppose $\{X_n:n\geq1\}$ is a Bernoulli sequence with rate $\tfrac12$. If $\widetilde{\theta}\sim U(0,1)$, then the first part shows that the sequence of bits $\{\widetilde{X}_n\}\stackrel{law}{=}\{X_n\}$. Therefore, $$ \theta:=\sum_{n\geq1}2^{-n}X_n\stackrel{law}{=} \sum_{n\geq1}2^{-n}\widetilde{X}_n=\widetilde{\theta} $$ since $\theta$ is a measurable function of $\{X_n\}$.

All this shows that on the Steinhaus space one can generate explicitly Bernoulli sequences.


  1. Generation of i.i.d sequences of uniform distributions:

One we can generate i.i.d sequences of Bernoulli random variables defined on the Steinhaus space, we can now generate i.i.d sequences of uniform random variables also defined on the Steinhaus space.

Lemma 2: There exist a sequence $(f_n)$ of measurable functions on $[0,1]$ such that for any $\theta\sim U[0,1]$, $(f_n(\theta))$ is an i.i.d sequence random variables with $f_1(\theta)\sim U[0,1]$.

Here is a short proof:

Reorder the sequence $(r_m)$ of binary bit maps into a two--dimensional array $(h_{n,j}:n,j\in\mathbb{N})$, and define the function $f_n:=\sum_{j\geq1}\tfrac{h_{nj}}{2^j}$ on $[0,1]$ for each $n$. From the fist Lemma, $\{X_n=r_n\circ\theta\}$ forms a Bernoulli sequence with rate $p=\tfrac12$. Thus, the collections $\sigma(X_{nj}:j\geq1)$ are independent. By the first Lemma, it follows that $(f_n)$ is an i.i.d. sequence of $U[0,1]$ random variables.


  1. Generation of any distribution on the real line:

For any probability space $(\Omega,\mathscr{F},\mathbb{P})$ and random variable $X:(\Omega,\mathscr{B})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R})$, the law or distribution of $X$ is the measure $\mu_X$ on $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ defined by $$\mu_X(B)=\mathbb{P}[X\in B],\quad B\in\mathscr{F}$$

One can generate a random variable $Q:((0,1),\mathbb{R}((0,1),\lambda)\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R})$ such that the law of $Q$ is $\mu_X$. This may be done by the "quantile function"

$$Q(t)=\inf\big\{x\in\mathbb{R}: \mathbb{P}[X\leq x]\geq t\big\},\quad 0<t<1$$ $Q$ is non-decreasing, right continuous and has left limits. More importantly, $Q$ satisfies

$$ F(x):=\mathbb{P}[X\leq x]\geq t \quad\text{iff}\quad Q(t) \leq x $$

Form this, it follows that $$\lambda[Q\leq x]:=\lambda\big(\{t\in(0,1): Q(t)\leq x\}\big)=\lambda\big(\{t\in(0,1): t\leq F(x)\}\big)=F(x)$$ and so $Q$ has the same distribution function as $X$.

Particular examples are:

  • $\Phi(x)=\frac{1}{2\pi}\int^x_{-\infty}e^{-t^2/2}\,dt$. $\Phi$ is continuous and strictly monotone increasing. It has then a continuous and strictly increasing inverse. Then $Q(t)=\Phi^{-1}(t)$, $0<t<1$, is a random variable defined in the Steinhaus space that has the Normal distributions.

  • $F(x)=1-e^{-x}$ is strictly monotone increasing and has inverse $F^{-1}(t)=-\log(1-t)$. Then $Q(t)=F^{-1}(t)$ is a random variable defined on the Steinhaus space and has exponential distribution.


  1. Generation independent sequences of random variables with any prescribed distribution.

Using (2) and (3) we can generate in random variables with any distribution (over $(\mathbb{R},\mathscr{B}(\mathbb{R})$).

Corollary 3. Suppose that $(S_n,\mathscr{S}_n,\,u_n):=(\mathbb{R},\mathscr{B}(\mathbb{R}),\mu_n)$, $n\in\mathbb{N}$ are Borel probability spaces. Then, there is a map $F:((0,1),\mathscr{B}((0,1)),\lambda)\rightarrow (\prod_nS_n,\bigotimes_n\mathscr{S}_n)$ such that the projections $p_n:\mathbf{s}\mapsto s_n$, form an independent sequence of random variables on $\big(\prod_nS_n,\bigotimes_n\mathscr{S}_n,\mu\big)$, $\mu=\lambda\circ F^{-1}$, with $p_n\stackrel{d}{=}\mu_n$.

Here is a short proof:

Lemma 2 provides a $U[0,1]$--distributed i.i.d. sequence $(f_n)$ of random variables defined on the Steinhaus space. Part 3 shows that for each $n$, there is a map $Q_n:(0,1)\rightarrow \mathbb{R}$ such that $\lambda\circ Q^{-1}_n=\mu_n$. The map $F$ given by $x\mapsto(Q_n(f_n(x)))$ has the stated properties.


(1) through (4) illustrate that all the basic tools od Probability theory -sampling, law of large numbers for i.i.d sequences, central limit theorem for i.i.d sequences among others- can be developed using the Steinhaus as canonical space.

The next part of the presentation is more subtle and I will skip details by adding references. On one end we illustrate how conditional expectation can be performed by randomization; on the other end, we show how stochastic processes can be constructed.


  1. There is a deep result in Measure theory that states that Borel sets of complete separable metric spaces are measurable isomorphic to $((0,1),\mathscr{B}(0,1))$ (if uncountable) or a to a countable subset of $((0,1),\mathscr{B})$. This provides another justification for the use of $((0,1),\mathscr{B}(0,1))$ as a canonical measurable space. Spaces that are measurably isomorphic to a Borel subset of $(0,1)$ are called Borel spaces.

In particular, in part (4) we can substitute $(\mathbb{R},\mathscr{B}(\mathbb{R}),\mu_n)$ by Borel probability spaces, for examples $(S_n,\mathscr{B}(S_n),\mu_n)$, where $S_n$ is a complete metric space (Polish space) space equipped with its Borel $\sigma$-algebra, and $\mu_n$ a probability measure on $(S_n\mathscr{B}(S_n))$.


  1. Regular conditional expectation:

Another deep result in Probability is the fact that if $(\Omega,\mathscr{F},\mathbb{P})$ is a probability space, and $(S,\mathscr{B}(S))$ is a Polish measurable space ( $S$ is a Polish spaced equipped with the Borel $\sigma$-algebra), and $\mathscr{A}$ is a sub $\sigma$-algebra of $\mathscr{F}$, then there is a stochastic kernel $\nu:\Omega\times\mathscr{B}(S)\rightarrow[0,1]$ from $(\Omega,\mathscr{A})$ tp $(S,\mathscr{B}(S))$ such $$\nu(\omega,A)=\mathbb{P}[X\in A|\mathscr{A}]\qquad \mathbb{P}-\text{a.s.}$$ for all $A\in\mathscr{A}$. Here, the map $\omega\rightarrow\nu(\omega,A)$ is $\mathscr{A}$--measurable for any foxed $A$.

This allows for a desintegration formula

Suppose $(S,\mathscr{S})$ is a Polish measurable space and $(T,\mathscr{T})$ beisany measurable space. Let $\mathscr{A}\subset\mathscr{F}$ sub--$\sigma$--algebra. Let $X:(\Omega,\mathscr{F})\rightarrow(S,\mathscr{S})$ be a random variables in $S$ (the observation above guarantees that $\mathbb{P}[X\in\cdot|\mathscr{A}]$ has a regular version $\nu$). If $Y:(\Omega,\mathscr{A})\rightarrow(T,\mathscr{T})$ and $f:(S\times T,\mathscr{S}\otimes\mathscr{T})\rightarrow\mathbb{C}$ are functions such that $\mathbb{E}[|f(X,Y)|]<\infty$ then, $$\begin{align} \mathbb{E}[f(X,Y)|\mathscr{A}](\cdot) &=\int_S f(x,Y(\cdot))\nu(\cdot,dx)\qquad \text{$\mathbb{P}$--a.s.}\label{conditional}\\ \mathbb{E}[f(X,Y)]&=\int_\Omega\Big(\int_S f(x,Y(\omega))\nu(\omega,dx)\Big)\mathbb{P}(d\omega)\tag{7}\label{disintegration} \end{align} $$ If $\mathscr{A}=\sigma(Y)$ and $\mathbb{P}[X\in dx|\sigma(Y)]=\nu(Y(\omega),dx)$ for some stochastic kernel from $(T,\mathscr{T})$ to $(S,\mathscr{S})$ then, $$\begin{align} \mathbb{E}[f(X,Y)|\sigma(Y)](\cdot) &= \int_S f(x,Y(\cdot))\mu(Y(\cdot),dx) \qquad\text{$\mathbb{P}$--a.s.}\\ \mathbb{E}[f(X,Y)] &=\int_\Omega\Big(\int_S f(x,Y(\omega))\mu(Y(\omega),dx)\Big)\mathbb{P}(d\omega) \end{align} $$ If $X$ and $Y$ are independent then, $\mu(X\in dx|\sigma(Y)](\cdot)=\mathbb{P}[X\in dx]$ $\mathbb{P}$--a.s.


  1. Randomization:

Stochastic kernels $\nu$ from any measure space $(T,\mathscr{T})$ to a Borel space $(S,\mathscr{S})$ can also be generated on the Steinhaus space.

Lemma 4. Let $\mu$ be a stochastic kernel from a measure space $S$ to a Borel space $T$. There is a function $f:S\otimes[0,1]\rightarrow T$ such that if $\theta\sim U[0,1]$, then the law of $f(s,\theta)$ is $\nu(s,\cdot)$.

Here is a short proof:

By part (5) it suffices to assume $(S,\mathscr{S})$ is the $((0,1),\mathscr{B}(0,1))$, for there is bijection $\phi:(0,1),\mathscr{B}((0,1))\longrightarrow(S,\mathscr{S})$ such that $\phi$ and $\phi^{-1}$ are measurable in which case we replace $\nu$ by $\eta(s,B):=\nu(s,\phi(B))$. Let $g:T\times (0,1):\rightarrow \mathbb{R}$ be defined as the quantile tranformation $$g(t,s)=\inf\{x\in(0,1): \nu(t,(-\infty,x])\geq s\}$$ Since $g(t,s)\leq x$ iff $\nu(t,(-\infty,x])\geq s$, the measurability of the map $s\mapsto\nu(s,(-\infty,x])$ implies that $g$ is $\mathscr{T}\otimes\mathscr{B}\big((0,1)\big)$ measurable. If $\theta\sim U[0,1]$ (for example, the identity function $\theta(t)=t$ on the Steinhaus space), then $$ \Pr[g(\theta,t)\leq x]=\Pr[\theta\leq\nu(t,(-\infty,x])]=\nu(t,(-\infty,x]) $$ This shows that $g(\theta,t)\sim \nu(t,dx)$. Therefore, for $f:=\phi\circ g$, $f(\theta,t)\sim\nu(t,ds)$.


  1. Existence of stochastic process:

Suppose $\{(S_t,\mathscr{S}_t):t\in\mathcal{T}\}$ is a collection of Borel spaces. For each $\mathcal{I}\subset\mathcal{T}$. Denote by $(S_\mathcal{I},\mathscr{S}_I)=\big(\prod_{t\in\mathcal{I}}S_t$, $\bigotimes_{t\in\mathcal{I}}\mathscr{S}_t\big)$ and let $p_{\mathcal{I}}:S_\mathcal{T}\longrightarrow S_{\mathcal{I}}$ be the projection $(s_t:t\in\mathcal{T})\mapsto(s_t:t\in\mathcal{I})$. A family of probability measures $\{\mu_\mathcal{J}:\mathcal{J}\subset\mathcal{T},\,\text{$\mathcal{J}$ finite or countable}\}$ on $\mathscr{S}_\mathcal{J}$ is projective if $$ \mu_{\mathcal{J}}\big(\cdot\times S_{\mathcal{J}\setminus\mathcal{I}}\big) =\mu_{\mathcal{I}}\big(\cdot\big),\qquad \mathcal{I}\subset\mathcal{J} $$ for any finite or countable $\mathcal{J}\subset\mathcal{T}$.

A deep theorem due to Kolmogorov establishes the existence of stochastic process

Theorem 5. Suppose $\{(S_t,\mathscr{S}_t):t\in\mathcal{T}\}$ is a family of Borel spaces. If $\{\mu_\mathcal{I}:\mathcal{I}\subset\mathcal{T},\,\text{$\mathcal{I}$ finite}\}$ is a projective family of probability measures on $\mathscr{S}_\mathcal{I}$, then there exists a unique probability measure $\mu$ on $\mathscr{S}_\mathcal{T}$ such that $$ \mu\circ p^{-1}_\mathcal{I}=\mu_\mathcal{I} $$ for any finite $\mathcal{I}\subset\mathcal{T}$.

By Part 5, all can be made into copies of a Borel subset of $(0,1)$ or $\mathbb{R}$. In such case, the canonical space for stochastic process $\{X_t:t\in\mathcal{T}\}$ can be chosen as $\big((0,1)^\mathcal{T},\mathscr{B}^{\otimes\mathcal{T}}(0,1)\big)$ or $\big(\mathbb{R}^\mathcal{T},\mathscr{B}^{\otimes\mathcal{T}}(\mathbb{R})\big)$


References:

  1. Kallenberg's, Foundations of modern probability covers the probabilitistic aspects of 1 to 8. His proves can be consider probabilistic (as oppose purely to measure theoretic). In particular his proof of Kolmogorov's extension relies in purely probabilistic constrictions.
  2. Parthasaraty's, Probability on Metric spaces is a good reference for the measurable isomorphic theorem that in essence reduces any nice probability space to the measurable space $((0,1),\mathscr{B}(0,1))$.
  3. Leo Breiman's classic Probability covers also beautifully Kolmogorov's extension theorem and many aspects of the points I discussed above.
Mittens
  • 46,352
2

First of all, a note on terminology: the (cumulative) distribution function of a random variable $X$ is usually defined as $$F_X(x) = P(\{\omega\in\Omega: X(\omega)\leq x\}.$$ Note here the $\leq$ instead of $<$.

Now let's get to your questions.

Q1: $(\mathfrak{R}, \mathfrak{B}(\mathfrak{R}), \mu)$ is not a probability space, because $\mu(\mathfrak{R}) = \infty.$ Instead, what we usually take is $$([0, 1], \mathfrak{B}([0, 1]), \mu),$$ where $\mu$ is Lebesgue measure restricted to $[0, 1]$. This space can underly any probability distribution on $\mathfrak{R}.$ Note first of all that the identity function $\omega\mapsto \omega$ itself is a real-valued random variable and that it has a uniform distribution on $[0, 1].$ If we now know two distribution functions $F_X$ and $F_Y,$ then $$X = F^{-1}_X(\omega), \quad Y = F^{-1}_Y(\omega)$$ have distribution functions $F_X$ and $F_Y$ respectively. $F^{-1}_X$ here denotes the generalized inverse of $F_X.$ To see that this is true, see here. This means that this space indeed underlies $X$ and $Y$.

Q2: This space does not satisfy the definition of a standard probability space that you mention, since it is not complete. However, $(\mathfrak{R}, \mathfrak{B}(\mathfrak{R}), P_X)$ can be called a canonical space for the random variable $X$ in the context of stochastic processes. Here, $P_X$ is the distribution of $X$ (which is a measure on $\mathfrak{R}$). That is, $P_X((-\infty, a]) = F_X(a),$ which is enough to define $P_X$ on $\mathfrak{B}(\mathfrak{R}).$ Then the identity $\omega \mapsto \omega$ has distribution $F_X$ on this space. More generally, if you have a sequence of random variables $X_1, ..., X_n,$ the canonical probability space is $(\mathfrak{R}^n, \mathfrak{B}(\mathfrak{R}^n), P_X),$ where $P_X$ is the distribution of the vector $(X_1, ..., X_n),$ defined by $$P_X((-\infty, a_1]\times ... \times (-\infty, a_n]) = P(X_1\leq a_1, ..., X_n\leq a_n).$$ Again, the identity then has the same distribution as the vector $(X_1, ..., X_n).$ So you can generealize this idea to a space for multiple random variables.

Q3: probability spaces and distributions are not independent, because as you note, we require probability spaces to be able to define distributions. That is, theoretically, we first construct a probability space $(\Omega, \mathcal{F}, P).$ Then we define a random variable $X: \Omega\to \mathfrak{R}$ and we can consider its distribution function $F_X(x) = P(\{\omega\in\Omega: X(\omega)\leq x\})$. That is, a distribution requires the existence of a probability space with a random variable. However, in practice, it suffices to only consider the distribution and forget about the underlying probability space, but this is not always the case, especially when you start getting into stochastic processes and you need to be a bit more careful about measurability concerns. Furthermore, note that a distribution is not associated to a particular probability space and random variable, it just requires that there exists one.

In practice, we usually forget about the fact that such a probability space needs to exist, because it turns out that for any potential distribution function $F:\mathfrak{R}\to [0,1]$ that is non-decreasing, right-continuous with $\lim_{x\to-\infty}F(x) = 0, \lim_{x\to\infty}F(x)=1$, there exists a probability space with a random variable such that it has cumulative distribution function $F.$ We have actually already seen this: the construction in Q1 works for any such $F.$ Hence, we can just dream up a function satisfying these requirements and we can be certain that there exists some probability space with a random variable with that function as its distribution function.

Dasherman
  • 4,276
  • If I understand your Q3 answer then it is Yes, I have a space $(\Omega,\mathcal F,P)$ and the probability measure $P$ is independent of and used to construct the distribution functions $F_X$ or $F_Y$ given distinct random variables $X$ and $Y$, but both $X$ and $Y$ share the same space and probability measure $P$. Is that correct? – Lars Ericson Jul 21 '20 at 13:22
  • Regarding Q1, I have the exponential distribution $F_e$ which has domain $\Re$, not $[0,1]$. What is the probability space underlying $F_e$ for domain $\Re$, in particular what is the probability measure $P$, assuming $\Omega=\Re$? Is that probability measure $P$ the same for uniform distribution $F_{U[a,b]}$? – Lars Ericson Jul 21 '20 at 13:26
  • For Q3: yes, in that sense it is independent. You don't need any distributions to construct a probability measure. For Q1: this is answered in Q2, basically. Take the $P_X$ for $X$ exponential. Then the identity is a random variable with an exponential distribution. This is a different measure then the one for the uniform distribution. Still, you can also define a random variable on this space that has a uniform distribution, but it's less direct. – Dasherman Jul 21 '20 at 14:50
2

Since Q1 and Q2 are well answered by other answers, I would like add some more details about Q3. Hope I correctly grasped the point of your question.


Although the meaning of distribution slightly varies across the literature and is sometimes misused, we can give a satisfactory definition that works in any abstract setting.

Let $X : \Omega \to \mathcal{S}$ be a $\mathcal{S}$-valued random variable from the probability space $(\Omega, \mathcal{F}, P)$ to a measurable space $(\mathcal{S}, \Sigma)$. In other words, it is a measurable function from $(\Omega, \mathcal{F})$ to $(\mathcal{S}, \Sigma)$.1) Then $X$ induces a probability measure $\mu$ on $(\mathcal{S}, \Sigma)$ via2)

$$ \forall E \in \Sigma \ : \quad \mu(E) = P(X \in E) = P(X^{-1}(E)) = P(\{\omega\in\Omega : X(\omega) \in E\}). $$

Then this $\mu$ is called the distribution of $X$.

Example 1. Let $\Omega = \{-1, 0, 1, 2\}$ be equipped with the power-set $\sigma$-algebra $\mathcal{F}=2^{\Omega}$ and the normalized counting measure $P(E) = \frac{1}{4}\#E$. Then

  • $X_1 : \Omega \to \mathbb{R}$ defined by $X_1(\omega) = \omega$ has the distribution $\mu_1$ on $\mathbb{R}$ given by $$ \mu_1(E) = \frac{1}{4} \mathbf{1}_{\{-1 \in E\}} + \frac{1}{4} \mathbf{1}_{\{0 \in E\}} + \frac{1}{4} \mathbf{1}_{\{1 \in E\}} + \frac{1}{4} \mathbf{1}_{\{2 \in E\}} $$ for any Borel subset $E$ of $\mathbb{R}$.

  • $X_2 : \Omega \to \mathbb{R}$ defined by $X_2(\omega) = \omega^2$ has the distribution $\mu_2$ on $\mathbb{R}$ given by $$ \mu_2(E) = \frac{1}{4} \mathbf{1}_{\{0 \in E\}} + \frac{1}{2} \mathbf{1}_{\{1 \in E\}} + \frac{1}{4} \mathbf{1}_{\{4 \in E\}} $$ for any Borel subset $E$ of $\mathbb{R}$.

  • $X_3 : \Omega \to \{0,1,4\}$ defined by $X_3(\omega) = \omega^2$ has the distribution $\mu_3$ on $\mathcal{S}=\{0,1,4\}$ given by $$ \mu_3(E) = \frac{1}{4} \mathbf{1}_{\{0 \in E\}} + \frac{1}{2} \mathbf{1}_{\{1 \in E\}} + \frac{1}{4} \mathbf{1}_{\{4 \in E\}} $$ for any subset $E$ of $\mathcal{S}$.3)

Example 2. Let $\Omega=[0,1]^2$ be equipped with the probability measure $P$ which is the Lebesgue measure restricted onto $[0, 1]^2$. Then

  • $X_4 : \Omega \to \mathbb{R}$ defined by $$ X_4(\omega_1, \omega_2) = \begin{cases} 0, & \text{if } \omega_1 \in [0, \frac{1}{4}); \\ 1, & \text{if } \omega_1 \in [\frac{1}{4}, \frac{3}{4}); \\ 4, & \text{if } \omega_1 \in [\frac{3}{4}, 1); \\ 2020, & \text{if } \omega_1 = 1; \end{cases} $$ has the same distribution as $X_2$.

  • $X_5, X_6 : \Omega \to \mathbb{R}$ defined by $$ X_5(\omega_1, \omega_2) = \begin{cases} -\log \omega_1, & \text{if } \omega_1 \in (0, 1]; \\ 42, & \text{if } \omega_1 = 0; \end{cases} \qquad X_6(\omega_1, \omega_2) = \begin{cases} -\log (1-\omega_2), & \text{if } \omega_2 \in [0, 1); \\ 1, & \text{if } \omega_2 = 1; \end{cases} $$ have the same distribution, which is the exponential distribution of unit rate. In other words, they induce the same probability measure $\mu_{5}$ on $\mathbb{R}$ defined by $$\mu_{5}(E) = \int_{E} e^{-x} \mathbf{1}_{(0,\infty)}(x) \, \mathrm{d}x $$ for any Borel subset $E$ of $\mathbb{R}$.

    The information about $\mu_5$ may be encoded in a different way using the cumulative distribution function (CDF). The CDF $F_{X_5}$ of $X_5$ is given by $$ F_{X_5}(x) = P(X_5 \leq x) = \mu_5((-\infty, x]) = \begin{cases} 0, & \text{if } x < 0; \\ 1 - e^{-x}, & \text{if} x \geq 0; \end{cases} $$ Of course, we have $F_{X_5} = F_{X_6}$ in this example.

  • Define $X_7 : \Omega \to \mathbb{R}^2$ by $X_7(\omega) = (X_5(\omega), X_6(\omega))$. Then its distribution $\mu_7$ is given by $$ \mu_7(E) = \iint_{E} e^{-x-y}\mathbf{1}_{(0,\infty)^2}(x,y) \, \mathrm{d}x\mathrm{d}y $$ for any Borel subset $E$ of $\mathbb{R}^2$. It turns out that $\mu_7 = \mu_5 \otimes \mu_5$ is the product of two copies of $\mu_5$, and its probabilistic implication is that $X_5$ and $X_6$ are independent.

Example 3. Let $\mu$ be any probability distribution on $\mathbb{R}$, and let $(\Omega, \mathcal{F}, P) = (\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu)$. Also define $X_8(\omega) = \omega$. Then $X_8$ has the distribution $\mu$. For this reason, we often consider the notion of distribution without explicit reference to a random variable. For example, the standard normal distribution is the probability measure on $\mathbb{R}$ defined by

$$ E \mapsto \int_{E} \frac{1}{\sqrt{2\pi}}e^{-x^2/2} \, \mathrm{d}x $$

for any Borel subset $E$ of $\mathbb{R}$. In this regard, we may as well say that the word distribution also stands for the honorable title given to a well-studied probability measure on a familiar space.

This construction also tells that, as long as we are only interested in dealing with a single random variable, the abstract notion of probability spaces is rather redundant and we can stick to this particular realization on $\mathbb{R}$. However, such notion provides great flexibility in developing various concepts under a unified framework and allowing to deal with them systematically.


1) If the term 'measurable space' is not familiar to you, you may regard $(\mathcal{S}, \Sigma)$ as the Euclidean space $(\mathbb{R}^d,\mathcal{B}(\mathbb{R}^d))$ equipped with the Borel $\sigma$-algebra. Also, you do not worry too much about what it means by a measurable map at this point.

2) For this reason, $\mu$ is sometimes called the pushforward of $P$ by $X$ and denoted by $\mu = P \circ X^{-1}$.

3) Technically speaking, $\mu_2$ and $\mu_3$ are different distributions. However, they convey the same amount of information, and so, such difference will never affect any conclusion about the 'randomness' of $X_2$ or $X_3$. My personal impression is that the choice $X_3$ seems to be preferred in elementary probability textbooks for its simplicity, whereas $X_2$ is a more common choice in the literature because this allows to compare different distributions systematically.

Sangchul Lee
  • 181,930
  • There is some circularity at the beginning of your answer. We start by saying $(\Omega,\mathcal F,P)$ is probability space. So by definition, $P$ is a probability measure. But then you defined induced probability measure $\mu$ which is dependent on a particular random variable $X$, so $\mu_X$. Suppose $X$ has exponential distribution and $Y$ has uniform $U[a,b]$ distribution, and $(\Omega,\mathcal F)=(\Re, \mathcal B(\Re))$. Is the probability space of $X$ then $(\Re, \mathcal B(\Re), \mu_X)$ or is it $(\Re, \mathcal B(\Re), P)$? If it is the latter, then what is $P$? – Lars Ericson Jul 21 '20 at 13:32
  • Reading page 7 of http://faculty.bard.edu/belk/math461/Probability.pdf, $(\Re,\mathcal B(\Re), \mu)$ is the probability space of continuous real distributions, where $\mu$ is Lebesgue measure. Example: $X\in\mathcal N(0,1)$ is an Gaussian random variable and $f_{\mathcal N(0,1)}:\Re\rightarrow[0,\infty]$ is the probability density function. Then $X$ has the probability distribution function $P_{\mathcal N(0,1)}(S) = \int_{t \in T} f_{\mathcal N(0,1)}(t) dm$. The confusion I'm having is whether to call $\mu$ or $P_{\mathcal N(0,1)}$ the probability measure of the probability space. – Lars Ericson Jul 21 '20 at 18:51
  • @LarsEricson, For your first comment, I am not sure what you mean by 'the probability space of $X$'. When a random variable $X : \Omega \to \mathbb{R}$ is involved, we can have two different probability spaces. One is the domain $(\Omega,\mathcal{F},P)$ of $X$, which is already a probability space. However, the codomain $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ can be also equipped which the probability measure $\mu_X$ which is induced by $P$. And this makes sense as long as $(\Omega,\mathcal{F},P)$ can be any probability space that is rich enough to host such a random variable $X$. – Sangchul Lee Jul 21 '20 at 18:56
  • @LarsEricson, Continuing from the previous comment, the induced probability measure $\mu_X$ is now called the distribution of $X$. So there is no circular argument going here. For instance, let $\Omega=[0,1)^2$, $\mathcal{F}=\mathcal{B}(\Omega)$, and $P$ is the Lebesgue measure restricted to $\Omega$. Then we may realize (I used this word because there can be other realizations as well) independent $X\sim\text{Exp}(1)$ and $Y\sim U[0,1]$ distribution by letting $$X(\omega_1,\omega_2)=-\log(1-\omega_1),\qquad Y(\omega_1,\omega_2)=\omega_2.$$ – Sangchul Lee Jul 21 '20 at 19:10
  • @LarsEricson In Page 7 of that note, note that $(\mathbb{R},\mathcal{B}(\mathbb{R}),m)$ is not a probability space, but a measure space (because the Lebesgue measure $m$ does not satisfy $m(\mathbb{R})=1$). This space is neither the domain nor the codomain of $X$, and is simply used to represent the induced probability measure $\mu_X$ (or $P_X$, using the notation of the note) via the relation $$P_X(E)=\int_E f(x),m(\mathrm{d}x).$$ Also note that this representation is not true for any random variable $X$. Indeed, we call $X$ continuous if the inducing a distribution $P_X$ of the above form. – Sangchul Lee Jul 21 '20 at 19:22
  • @LarsEricson In this regard, it might be worth to mention the other extreme. If $S$ is an at most countable subset of $\Bbb{R}$ and $P(X\in S)=1$, then the induced probability measure $P_X$ admits a representation $$P_X(E)=\sum_{x\in E\cap S}P_X({x}).$$ This case may be viewed as an analogue of the case of continuous distribution. Indeed, if we write $p(x):=P_X({x})$ for the PMF of $X$ and $#(E):=\sum_{x\in S}\mathbf{1}{{x\in E}}$ for the counting measure supprorted on $S$, then we have $$P_X(E)=\int{E}p(x),#(\mathrm{d}x).$$ (But please disregard this comment if it confuses you.) – Sangchul Lee Jul 21 '20 at 19:28
  • I'm sorry: I'm still hung up on p. 3 section 1.1B of KPS text cited above. KPS starts with probability space $(\Omega,\mathcal A,P)=(\Re,\mathcal B(\Re),P)$. Two distinct random variables $V$, $V=X_{Exp(\lambda)}$ and $V=X_{U[a,b]}$, are said to have distribution functions $F_V=P_V((-\infty,x))=P({\omega \in \Omega: V(\omega)<x})$. These are distinct and solved separately as $F_{U[a,b]}(x) = \mathcal H(x-a) \mathcal H(b-x) \frac{x-a}{x-b} + \mathcal H(x-b)$ and $F_{Exp(\lambda)}=\mathcal H(x) (1-e^{-\lambda x})$. My question is what is the $P$ shared by $X$ and $Y$, or is the text wrong. – Lars Ericson Jul 22 '20 at 16:08
  • @LarsEricson, I checked the first five pages of the textbook. However, it seems to me that they never mentioned about choosing $(\Omega,\mathcal{A},P)$ as $(\mathbb{R},\mathcal{B}(\mathbb{R}),P)$, but rather their convention aligns with the other answers as well as the usual practice; they did not bother to specify what $(\Omega,\mathcal{A},P)$ really is in most of the cases. So, please allow me to ask back. Where did you find that claim? – Sangchul Lee Jul 22 '20 at 17:05
  • We can choose $\Omega=\Re$ because the domain of the distribution function is $\Re$. We can choose $\mathcal F = \mathcal B(\Re)$ because the Borel set is a $\sigma$-algebra for $\Re$. I don't know what to construct for $P$. The "flow" of the text implies that we can choose a single $P$ for both random variables. My "claim" is just that the flow of the text implies that there is a single $P$. If not, I wish everybody here would just say "No, there are two $P$'s, one for $Exp(\lambda)$ and one for $U[a,b]$, and the text is poorly written." If so, I wish someone would construct $P$. – Lars Ericson Jul 23 '20 at 12:00
  • @LarsEricson, As I (and many other answers) pointed out, a single probability space $(\Omega,\mathcal{F},P)$ can host multiple RVs. For instance, check my example in the above comment, where a single probability space hosts two RVs $X\sim\text{Exp}(1)$ and $Y\sim U[0,1]$. Moreover, Kolmogorov extension theorem allows that, for any compatible family of finite-dimensional distributions, we can always construct a single probability space and countable many RVs on it that realize those distributions. I truly hope you read everyone's answer as well. – Sangchul Lee Jul 23 '20 at 12:09
  • Thank you. I will carefully read and re-read the many thoughtful answers. In this case, though, can you construct $P$, for the two given RVs? I just re-read your answer and I see some $P$s but not a $P$ for the particular case of $X \sim Exp(\lambda)$ and $Y \sim U[a,b]$, where $(\Omega,\mathcal F)=(\Re,\mathcal B(\Re))$. I'm very sorry if I don't have the capacity to understand, but I don't see that construction anywhere in any of the offered answers. Lots and lots of stuff, but not that. If it's there in one of the answers, can you point it out for me? – Lars Ericson Jul 23 '20 at 12:16
  • Since $(\Omega,\mathcal{F})=(\mathbb{R},\mathcal{B}(\mathbb{R}))$ is an example of a standard Borel space, it is absolutely possible. However, it will be less transparent as to how we realize such a pair of RVs on $\mathbb{R}^2$. Anyway, here are possible realizations, assuming you do not care about their joint distributions: Let $P(E)=\text{Leb}(E\cap[0,1])$ so that $P$ is the Lebesgue measure restricted to $[0,1]$. (1st realization.) Let $$X(\omega)=\begin{cases}-\lambda^{-1}\log\omega,&\omega\in(0,1]\0,&\text{o/w}\end{cases},\qquad Y(\omega)=a+(b-a)\omega.$$ – Sangchul Lee Jul 23 '20 at 12:23
  • In this realization, we have both $P_X=\text{Exp}(\lambda)$ and $P_Y=U[a,b]$, although $X$ and $Y$ are not independent. (Realization 2) We first define $$Z_n(\omega)=\begin{cases}\lfloor2^n\omega\rfloor-2\lfloor2^{n-1}\omega\rfloor,&\omega\in[0,1]\0,&\text{o/w}\end{cases}$$ I.e., $Z_n(\omega)$ is the $n$-th binary digit in the binary expansion of $\omega$. Then $Z_n$'s are all independent RVs under $P$, and $$X_1(\omega)=\sum_{k\geq1}\frac{Z_{2k-1}}{2^k},\qquad X_2(\omega)=\sum_{k\geq1}\frac{Z_{2k}}{2^k}$$ are independent $U[0,1]$ RVs. Now let $Y=-\lambda^{-1}\log X_1$ and $X=a+(b-a)X_2$. – Sangchul Lee Jul 23 '20 at 12:29
  • In this second realization, $P_Y=\text{Exp}(\lambda)$ and $P_X=U[a,b]$, and on top of that, both $X$ and $Y$ are independent. Of course, there are infinitely many different ways to construct $P$ and $X, Y$ on $(\mathbb{R},\mathcal{B}(\mathbb{R})$ such that $P_X=U[a,b]$ and $P_Y=\text{Exp}(\lambda)$. The above examples are just two of them. – Sangchul Lee Jul 23 '20 at 12:34
  • This seems to be an answer for $\Omega=[0,1]$. I am specifically asking about $\Omega=\Re$. Other people gave answers for domain of $X$ restricted to $[0,1]$. I am asking for domain of $X$ equal to $\Re$. This seems to be hard because the Lebesgue measure $\mu$ is not a probability measure on $\Re$. Only when you restrict to $[0,1]$ is it a probability measure and then $(\Omega,\mathcal F,P)=([0,1],\mathcal B([0,1]),\mu)$ is a probability space. Does your answer work for $\Omega=\Re$? I don't see it. – Lars Ericson Jul 23 '20 at 12:38
  • My $P$ is defined on $\mathbb{R}$ (although it assigns zero probability outside $[0,1]$). I guess you are looking for an example where $P$ is supported on all of $\mathbb{R}$, but such example is very easy to obtain by slightly tweaking the above example. Let $P$ be the probability measure on $\mathbb{R}$ defined by $$P(E)=\int_{E}\frac{\mathrm{d}x}{\pi(1+x^2)}.$$ In particular, $P([a,b])=\frac{1}{\pi}(\arctan b-\arctan a)$. Then define $$Y(\omega)=-\lambda\log\left(\frac{1}{2}+\frac{1}{\pi}\arctan \omega\right),\qquad X(\omega)=a+(b-a)\left(\frac{1}{2}+\frac{1}{\pi}\arctan \omega\right).$$ – Sangchul Lee Jul 23 '20 at 12:43
  • Actually I see now after reviewing all answers that $[0,1]$ is the right space, because it is as good as $\Omega$ for enumerating the continuous outcomes into the real line. Thank you for making it work for $\Re$ though. My insistence on $\Omega=\Re$ was based on a misunderstanding of something I read about the "usual convention", but I can't find where I read that. – Lars Ericson Jul 23 '20 at 16:51
  • You say "$X_4$ has the same distribution as $X_2$,". Are you saying that $\mu_2 ={a.s.} \mu_4$, for some definition of $={a.s.}$ on distributions? Saying $X_2 =_{a.s.} X_4$ doesn't quite work if the definition is $P({ \omega \in \Omega : X_2(\omega) = X_4(\omega)})$, because the domain of $X_2$ is $\Omega_2={-1,0,1,2}$ and the domain of $X_4$ is $\Omega_4=[0,1]^2$. Neither domain is a subset of the other. – Lars Ericson Aug 03 '20 at 02:37
  • @LarsEricson, You seems to be confused by the distinction between random variable and its distribution again. Two real-valued random variables $X_1$ on $(\Omega_1,\mathcal{F}1,P_1)$ and $X_2$ on $(\Omega_2,\mathcal{F}_2,P_2)$ are said to have the same distribution if they induce the same distribution (i.e., pushforward probability measure), which means that $$P_1(X_1\in B)=P_2(X_2\in B)$$ for any Borel subset $B$ of $\mathbb{R}$, or equivalently, $$F{X_1}(x)=P_1(X_1\leq x)=P_2(X_2\leq x)=F_{X_2}(x)$$ for all $x\in\mathbb{R}$. In my answer, I used $\mu_i(\cdot)$ to denote $P(X_i\in\cdot)$. – Sangchul Lee Aug 03 '20 at 03:51
  • So it makes little sense to say "$\mu_2=\mu_4$ a.s.", since they are not random variables. As for distinguishing different concepts in probability theory, I always like to give the following analogy: If you are a police measuring the speed of cars using a speed gun, then $(\Omega,\mathcal{F},P)$ encodes all the information and randomness about cars passing by, $X:\Omega\to\mathbb{R}$ is the speed gun itself (associating each observation to a number), and the distribution of $X$ is the histogram of speed records. – Sangchul Lee Aug 03 '20 at 03:57
  • So $P_1({X_1(\omega_1) \leq x : \omega_1 \in {-1,0,1,2}}) = P_2({X_2(\omega_1,\omega_2) \leq x: (\omega_1,\omega2) \in [0,1]^2})$. This depends on P_2({2020}) = 0 or something like it, which is why I was looking for some way to express the idea that it is OK to throw 2020 into the definition of $X_4$, in particular $X_4(1,\omega)=2020$, because presumably ${ (1,\omega): \omega \in [0,1]}$ is a set of measure 0 or something like that. I'm getting stuck on the mechanics of showing that throwing in that 2020 case doesn't change $P_1 = P_2$. – Lars Ericson Aug 03 '20 at 15:24
  • This is a minor point but a statement like $P_2(X_2 \in B)=P_4(X_4\in B)$ is a little tricky for me because it is not saying that $P_2=P_4$ because the domains of $X_2: \Omega_2 \to \Re$ and $X_4: \Omega_4 \to \Re$ are different, so $P_2: \mathcal F_2 \to [0,1]$ and $P_4: \mathcal F_4 \to [0,1]$ are not directly comparable. We can say that if $(\forall x \in \Re) F_{X_2}(x) = F_{X_4}(x)$, then $F_{X_2} = F_{X_4}$, so in a sense that's easier to reason about. – Lars Ericson Aug 03 '20 at 19:58
  • For the example above we have

    $$F_{X_2}(x) = \begin{cases} 0 & x \in (-\infty,0) \ \frac{1}{4} & x \in [0,1) \ \frac{3}{4} & x \in[1,4) \ 1 & x \in [4,\infty) \end{cases}$$

    and

    $$F_{X_4}(x) =
    \begin{cases} 0 & x \in (-\infty,0) \ \frac{1}{4} & x \in [0,1) \ \frac{3}{4} & x \in[1,4) \ 1 & x \in [4,2020) \ 1 & x \in [2020,\infty) \end{cases}$$

    So they are equal because $P_4({(2020,\omega_2) : \omega_2 \in \Re}) = 0$.

    – Lars Ericson Aug 03 '20 at 22:08
  • The last step being to show that the Lebesgue measure of a single point is 0, so in the product $[0,1]^2$, 0 times the line in ${(2020,\omega_2):\omega_2\in\Re}$ is still 0. https://math.stackexchange.com/questions/2602556/showing-that-the-lebesgue-measure-of-a-single-point-is-zero The proof of this seems actually very tricky and not at all obvious. It's not just a matter of quoting some axiom: http://math.uchicago.edu/~may/REU2013/MeasureZero.pdf and https://math.stackexchange.com/questions/618340/sub-dimensional-linear-subspaces-of-mathbbrn-have-measure-zero – Lars Ericson Aug 04 '20 at 14:05
1

Some concepts/definitions that might help:

A probability measure on $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d) \right)$ is called distribution. The triplet obtained can be called a distribution space to distinguish it from the general probability space.

Typical distributions are built from Lebesgue measure and $\mathcal{B}(\mathbf{R}^d)$-measurable functions $h:\mathbf{R}^d\rightarrow [0,\infty) $ with $$ \int_{\mathbf{R}^d} h(x) \mu(dx) =1$$ by $$ P_h(B) = \int_B h(x) \mu(dx) $$ for all $B\in \mathcal{B}(\mathbf{R}^d)$.

An example of distribution that cannot be built this way is Dirac's distribution concentrated at some point $x_0 \in \mathbf{R}^d$:

$$ \delta_{x_0} (B) = 1_{x_0\in B}$$ for all $B\in \mathcal{B}(\mathbf{R}^d)$.

Also, given probability space $\left(\Omega, \mathcal{F}, P\right)$ and $X:\Omega\rightarrow \mathbf{R}^d$ which is $\mathcal{F}/\mathcal{B}(\mathbf{R}^d)$-measurable, one can build a distribution $P_X$ as follows:

$$ P_X = P \circ X^{-1}, $$

usually called the distribution of $X$ (or law of $X$), which suggests that now one can focus only on the distribution space $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d), P_X \right)$.

Note: If $\Omega = \mathbf{R}^d, \mathcal{F} = \mathcal{B}(\mathbf{R}^d)$ and $P$ is a distribution, then taking $X$ to be the identity function, $id$, we have:

$$ P_{X} = P.$$

Note 2: Two random variables, possibly defined on different spaces, can have the same distribution (law).

If $X$ is defined on an abstract space $\left(\Omega, \mathcal{F}, P\right)$ as above, it induces distribution $ P_X$.

Then random variable $id$ defined on $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d), P_X \right)$ has the same distribution.

Many models rely on knowing the distribution of a random variable $X$ rather than its explicit form and the probability space on which it is defined.

Note 3: To answer Q3, I guess, we have the following facts:

  1. A distribution space is just a particular case of probability space.

  2. Yes, for a distribution, be it $P_h$ or Dirac type, there is always a random variable on a 'supporting' probability space that induces the same distribution: we take the probability space to be the starting distribution space itself and the random variable to be the identity function.

  3. (Complementing Note 2) If $A,B\in \mathcal{F}$ are different events such that $P(A)=P(B)$, then $$1_A \not= 1_B,$$ but they are random variables with the same distribution, that is

$$ P_{1_A} = P_{1_B}.$$

  1. If $\alpha: \left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d)\right) \rightarrow \left(\mathbf{R}^f, \mathcal{B}(\mathbf{R}^f) \right) $ is measurable, then

$$ P_{\alpha \circ X} = P_X \circ \alpha^{-1}. $$

Note 4: I finally realized that you are focusing on the distribution function.

A function $F:\mathbf{R}\rightarrow \mathbf{R}$ which is non-decreasing, bounded, left-continuous and for which $$\lim_{x\rightarrow -\infty} F(x) = 0$$ is called a distribution function. This definition stands on its own (no mention of measures).

The following facts can be proven.

Fact: Let $F$ be a distribution function such that $$\lim_{x\rightarrow \infty} F(x) = 1.$$ Let also $m$ be a measure on $\left((0,1), \mathcal{B}((0,1))\right)$ such that $$ m((0,x))=x $$ for all $x\in (0,1]$ (its existence can be proven). Then there is a non-decreasing function $f:(0,1) \rightarrow \mathbf{R}$ such that measure $m\circ f^{-1}$ has $F$ as distribution function, that is

$$ (m\circ f^{-1})((-\infty,x)) = F(x)$$

for all $x\in \mathbf{R}$.

Fact 2: A measure $\mu$ on $(\mathbf{R}, \mathcal{B}(\mathbf{R}))$ is perfectly determined by its distribution function $F_\mu$ defined as $$ F_\mu(x) = \mu ((-\infty,x)) $$ for all $x\in \mathbf{R}$. That is, if two measures on $(\mathbf{R}, \mathcal{B}(\mathbf{R}))$ have the same distribution function, they coincide.

These suggests that specifying the triplet

$$\left(\mathbf{R}, \mathcal{B}(\mathbf{R}), m\circ f^{-1}\right)$$

for some non-decreasing $f$ or rather a distribution function $F$ (with $\lim_{x\rightarrow \infty} F(x) = 1$, for which we know such $f$ exists) is the essential step in setting up any distribution space.

For a random variable on an abstract probability space, $X:(\Omega, \mathcal{F}, P) \rightarrow (\mathbf{R}, \mathcal{B}(\mathbf{R}))$, as soon as we get $P_X$, the associated distribution, and $F_X$ its distribution function, as defined in the book, we are done (can forget about $X$, in some sense; basically replace it with $id$ introduced in Note 2, as it has the same distribution). Note that:

$$ F_X = F_{P_X} $$

with the second term defined above (in Fact 2).

ir7
  • 6,289
  • If I read this right, you're saying that the probability space $(\Re^d,\mathcal B(\Re^d),P_X)$ contains the distribution in the measure $P_X$. This answers my Q3 in the negative. To answer Q3 as "yes", then the probability space would only be $(\Re^d,\mathcal B(\Re^d),P)$, which supports distributions such as $F_X$ and $F_Y$ defined by $F_z(x) = P({\omega\in\Omega:z(\omega)<x})$ for $z=X$ or $z=Y$. – Lars Ericson Jul 19 '20 at 19:41
  • $P$ is a function on $\mathcal{F}$, while $P_X$ is a function on $\mathcal{B}(\mathbb{R^d})$. A space triplet can't include 'hidden' measures like your triplet $(\mathbf{R}^d, \mathcal{B}(\mathbb{R^d}),P) $ does. – ir7 Jul 19 '20 at 19:47
  • If the supporting space is already a distribution space, one can take $X$ to be the identity function. – ir7 Jul 19 '20 at 19:57
  • I am confused by your comments. In general the space is $(\Omega,\mathcal F,P)$ so in this case we are saying $(\Omega,\mathcal F,P)=(\Re^d,\mathcal B(\Re^d),P)$ so $\mathcal F = \mathcal B(\Re^d)$ so saying $P_X : \mathcal B(\Re^d)\rightarrow[0,1]$ is the same as saying saying $P_X : \mathcal F\rightarrow[0,1]$ – Lars Ericson Jul 19 '20 at 20:01
  • I added a note to explain what I meant. – ir7 Jul 19 '20 at 20:06
  • to clarify are you still saying that the probability space of the distribution $F_X$ associated with random variable $X$ is $(\Omega,\mathcal F,P_X)$, and if $X \neq Y$, then $Y$ will have a different probability space $(\Omega,\mathcal F,P_Y)$? I.e. is the answer to question 3 "No"? In which case, what do you call $(\Omega,\mathcal F, P)$, and where does $P$ come from? Is it just a measure space and not a probability space, except where $X$ is the identity function? Not trying to be difficult, just confused. – Lars Ericson Jul 19 '20 at 20:09
  • Added a bit more to explain that the distribution matters more than the space on which the r.v. is defined. (I'm not sure why you put $P_X$, which is a distribution, in the same triplet with the abstract $\Omega$.) – ir7 Jul 19 '20 at 20:30
  • Added a third note too, which might answer Q3. – ir7 Jul 19 '20 at 21:02
  • in your answer you are putting $P_X$ in the triplet, several times, which is why I did that in my note. In my question, RV $X: \Omega\rightarrow \Re$ and $P: \mathcal F \rightarrow [0,1]$. So $X^{-1}: \Re\rightarrow\Omega$. You could make an RV $Y:\Omega\rightarrow[0,1]$ by mapping $Y(x)=e^{X(x)}$, if that simplifies matters. Composing $P \circ Y^{-1}$ still is not well-formed because $Y^{-1}: [0,1] \rightarrow \Omega$ but $P: \mathcal F \rightarrow [0,1]$. The types $\Omega$ and $\mathcal F$ don't match, $\mathcal F$ is the space of sets of subsets of $\Omega$. So I'm still confused. – Lars Ericson Jul 19 '20 at 21:38
  • I see. $Y= \exp \circ X$, so $Y^{-1}$ takes a subset in $[0,1]$ and returns a subset in $\Omega$, $Y^{-1} (S)=X^{-1}(\exp^{-1}(S)) $. – ir7 Jul 19 '20 at 21:51
  • That is $P_{\exp \circ X} = P_X \circ \exp^{-1} $. – ir7 Jul 19 '20 at 22:00
  • I added a few missing definitions at the top of my question from the original text, to clarify my notation. Maybe the extra details will clarify what I'm starting with if that helps. – Lars Ericson Jul 19 '20 at 22:02
  • I saw it. I think we have the same direction. What I said above is that $P_Y=P\circ Y^{-1}$ is well-formed, if you follow the composition of inverses. The codomain of $P$ has nothing to do with the codomain of $\exp$, they just happen to be the same, if that's your worry, – ir7 Jul 19 '20 at 22:06
  • I'm still not quite getting your explanation. I will study it more. I've added a discrete example to the question above to clarify my understanding or misunderstanding of the language in the KPS book I cite at the beginning of the question. You're way ahead of me on some advanced argot that I don't quite get. I'm trying to bring it back to the definitions in KPS to get a simple answer about how to set things up. – Lars Ericson Jul 20 '20 at 03:40
  • No worry. You'll figure it out. I added one more fact on distribution functions that might also help. – ir7 Jul 20 '20 at 18:33
  • Please, avoid making several edits. – Aloizio Macedo Jul 21 '20 at 13:40
  • @LarsEricson I think that it is pretty clear that the authors define their variables by directly specifying the distribution functions (for uniform they actually specify the the density, while for exponential they go straight to the distribution function). – ir7 Jul 23 '20 at 05:06
  • (continuation) And that it perfectly fine based on my Fact 2. Imagining the experiments (coin toss, dice throw) behind the variables (or rather their abstract representation) is useful too (mathematical rigour, processes etc), but for distribution specification all that matters is the distribution function. – ir7 Jul 23 '20 at 05:12