Functions of random variables result, where does it come from

Question

I have learned that if one has two random variables, say $X$ and $Y$ and if $Y=g(x)$, then we have that density of r.v. $Y$ is:

$$f_Y(y) = f_X(g^{-1}(y))\left| \frac{d(g^{-1}(y))}{dx}\right|$$

This result is obtained by looking at two cases when the $g(x)$ is monotonically decreasing and monotonically increasing and differentiating w.r.t. to $x$ in both cases, for example, for monotonically increasing case:

$$F(y) = P(Y \leq y) = P(X\leq x) = \int_{-\infty}^x f_X(\hat{x})d\hat{x} = \int_{-\infty}^{g^{-1}(y)}f_X(x) dx$$

Now differentiating the above w.r.t. $x$ and using the fundamental theorem of calculus, one obtains the required result in the first line.

My question is the following. I have seen my lecturer use the following notation:

$$f_X(x)|dx| = f_Y(g(x))|dg(x)|$$

is this an equivalent statement? And can one simply integrate both sides to obtain cumulative distribution functions (the second question is really: how to treat the absolute values to obtain the cumulative distribution function(s)). Thanks!

Excuse me for editing my answer so many times, but it is sometimes hard to write precisely what can be easily said. — Slowpoke, May 21 '16 at 12:17
That is fine :) i will consider your answer a bit later. Thank you! — naz, May 21 '16 at 12:26

Slowpoke · Answer 1 · 2016-05-25T05:20:28.380

3

The true roots of this formula lie in the change of variables using absolute value of Jacobian in calculus. Recall that in 2D case, if you transform a small square area $dS=dxdy$ into small area $dU=dudv$, then $$\int f(x,y)dxdy = \int h(u,v)\cdot \big|J(u,v)\big|dudv, \ \ \ \big|J(u,v)\big|=\bigg|\frac{\partial(x,y)}{\partial(u,v)}\bigg|$$ where $h(u,v) = f(x(u,v),y(u,v))$. Here, in 1D, $u = g(x)$, so $x(u) = g^{-1}(u)$ and the Jacobian is just a $\big|\frac{dg^{-1}(u)}{du}\big|$. We have

$$\int f_X(x)dx = \int f_X(g^{-1}(u))\cdot \big|\frac{dg^{-1}(u)}{du}\big|du = \int f_U(u)du. $$ In the last equality we can already drop the integral sign. Now lets begin with his notation. At first, this is inverse transition, i.e. $y = g(x)$. We stay in $\mathbb{R}$, so $dS$=$dU$=$dx$. Thus $f_X(x)$ we replace with $f_Y(g(x)) \leftrightarrow h(u) $ and the Jacobian $\big|\frac{dg(x)}{dx}\big|$:

$$\int f_X(x)dx = \int f_Y(g(x))\cdot \big|\frac{dg(x)}{dx}\big|dx. $$

What's left is omitting the integrals - Ok, as we are working with infinitely small area and we can forget about the limits, - and separating the derivatives. Here I really don't know why do you need the last thing (so I can make a mistake), this is indeed a first order differential equation with separation of variables $dx$ and $dy = d(g(x))$. I think, you can really integrate both parts to get the probability of some event $A$, using that Lebesgue measure is always positive $|dx|=dx$ :

$$\int_A f_Y(g(x))\cdot |d(g(x))| = \int_A f_Y(g(x))\cdot |g'(x)|\cdot |dx|=\int_A f_Y(g(x))\cdot |g'(x)|\cdot dx. $$

Hope that it would help you somehow.

edited May 25 '16 at 05:20

answered May 21 '16 at 11:32

Slowpoke

862

A question: how does the absolute value come about in the second and third displayed equations (the ones starting $\int f_X(x),dx=\ldots$)? There is no absolute value in the definition of the Jacobian, so the absolute value must be accounted for separately. I believe the absolute value can be understood as originating from the asymmetry in the definition of the cumulative distribution function: probability is accumulated to the left, not the right. When $g$ is decreasing, so that it reverses left and right, the Jacobian is accompanied by an extra $-1$ factor that accounts for this reversal. – Will Orrick May 22 '16 at 19:34
@WillOrrick You're right that I haven't explained it, because I always used the Jacobian for higher dimensions where it is used with absolute value as it is the shrinkage factor for the volumes. In higher dimensions we take integrals over non-oriented areas, but by flipping limits in 1D case we make our interval oriented. I think it would be more simple to just say, that as long as we calculate measure and don't care about the orientation in 1D case, we always put limits from lesser to bigger, thus using the absolute value of Jacobian just like in 2D. – Slowpoke May 23 '16 at 03:03
Thanks for the explanation. – Will Orrick May 23 '16 at 09:54
Relevant. – Will Orrick May 23 '16 at 12:23
@WillOrrick They actually say the same: "in 1-d, you have two options: use the absolute value and put the new limits in order from lesser to greater; OR, don't use the absolute value, but put the limits in the same order as the original integral. " – Slowpoke May 23 '16 at 14:07
@WillOrrick I have found here, p.3 and here some explanations that more correspond to my intuition but they are actually a bit confusing and overloaded. The way I feel it is that Jacobian is simply a shrinkage factor of small volumes. As long as we calculate measure (area, volume, etc) we do not care about the orientation or the interval (I have even forgot in my answer about the orientation and that Jacobian can be used without absolute value). – Slowpoke May 23 '16 at 14:58
Yes, your picture definitely makes sense. I was just trying to clarify that the absolute value isn't part of the definition of the Jacobian determinant, although it does appear that in many applications, the absolute value is taken. – Will Orrick May 23 '16 at 15:42

Will Orrick · Accepted Answer · 2016-05-21T22:17:27.937

I believe you should be differentiating with respect to $y$, not $x$, both in the first displayed equation, and in the subsequent derivation. You should also be stating the assumption that $g$ is a monotonic function. If it isn't, you need to be summing over different monotonic pieces of $g$ in your first displayed equation.

You haven't shown the derivation in the case $g$ monotonically decreasing, but the issue there is that, for $y=g(x)$, we have $$ F_Y(y)=P(Y\le y)=P(X\ge x)=1-F_X(x)=1-F_X(g^{-1}(y)). $$ Differentiating both sizes with respect to $y$ gives the same formula as in the case of $g$ monotonically increasing, except for a minus sign. This explains the origin of the absolute value.

In practice, to compute the cumulative distribution function $F_Y(y)$ by integrating with respect to $x$, you have to know whether $g$ is increasing or decreasing. If it is the former, then you integrate from $-\infty$ to $g(x)$; if the latter, you integrate from $g(x)$ to $+\infty$. For a nonmonotonic function, you might have to integrate over several intervals and add the results. So in practice, "treating the absolute value to obtain the cumulative distribution function" means knowing enough about the shape of $g$ to figure out which intervals to integrate over.

(Added) To answer your questions more directly:

The equation $$ f_Y(y)=f_X(g^{-1}(y))\left\lvert\frac{dg^{-1}(y)}{dy}\right\rvert $$ is equivalent to $f_Y(y)\,\lvert dy\rvert=f_X(g^{-1}(y))\,\lvert dg^{-1}(y)\rvert$, which is equivalent to $f_Y(g(x))\,\lvert dg(x)\rvert=f_X(x)\,\lvert dx\rvert$.
Yes, you can simply integrate both sides to obtain the cumulative distribution function $F_X(x)$; doing so will tell you how $F_X(x)$ and $F_Y(g(x))$ are related. For example, suppose that $y=g(x)=-x$. Then $$ F_X(x)=\int_{-\infty}^x f_X(\hat x)\,d\hat x=\int_{-\infty}^x f_Y(-\hat x)\,d\hat x=\int_{\infty}^{-x} f_Y(\hat y)\,(-d\hat y)=\int_{-x}^{\infty} f_Y(\hat y)\,d\hat y=1-F_Y(-x). $$ In this calculation, the first equality is the definition of $F_X(x)$; the second equality follows from $f_Y(g(\hat x))=f_Y(-\hat x)$ and $\lvert dg(\hat x)\rvert=\lvert -d\hat x\rvert=d\hat x$, since $d\hat x$ represents a small positive interval when integrating from $-\infty$ to $x$; the third equality follows from the change of variable $\hat y=-\hat x$; the fourth equality is obtained by reversing the limits of integration; the final equality follows from the definition $F_Y(-x)=\int_{-\infty}^{-x}f_Y(\hat y)\,d\hat y$ and the property that $\int_{-\infty}^{\infty}f_Y(\hat y)\,d\hat y=1$.

You are correct (about differentiation). I am missing out on something — naz, May 21 '16 at 12:53

Functions of random variables result, where does it come from

2 Answers2