How to derive the density formula for $Y=h(X)$ when $h$ is not injective?

Question

Let $X$ be a real-valued random variable with density $f_X$, and let $Y = h(X)$, where $h: \mathbb{R} \to \mathbb{R}$ is a continuously differentiable function. Assume that for every $y \in \mathbb{R}$, the preimage $h^{-1}(y)$ is at most countable.

Under these assumptions, the density of $Y$ is often stated as:

$$ f_Y(y) = \sum_{x : h(x) = y} \frac{f_X(x)}{|h'(x)|}. $$

How can one rigorously prove this formula?

I tried to compute $\mathbb{P}(Y \leq y)$ by partitioning it into a countable collection of events, where on each one $h$ is injective. However, I’m not sure such a decomposition is always possible. It seems to work if we assume that the set $\{x : h'(x) = 0\}$ is countable and has no accumulation points, but I’m not sure how to prove the formula even under that assumption.

Can anyone help clarify the correct conditions under which this formula holds, and how to prove it rigorously?

Interesting claim: it is easy to see that it is true for simple examples, like $h(x)=e^{-x^2}$ which should yield $f_Y(y)=\frac{f_X( -\sqrt{-\ln(y)})+f_X(\sqrt{-\ln(y)})}{2y\sqrt{-\ln(y)}}, y \in (0,1)$. However, I have never seen it. Would you share the sources you have seen it in? — Snoop, Jun 03 '25 at 19:56
Nevermind: found the claim (with a due assumption) in Casella-Berger Th.2.3 p. 51. — Snoop, Jun 03 '25 at 20:10
I have checked some editions and no, the proof seems omitted. However, given the Casella-Berger assumptions, it should be straightforward since the problem is greatly simplified. If these assumptions would satisfy your request, then you can edit them in and an answer with a proof may be provided by the community. — Snoop, Jun 03 '25 at 21:57
In general, the density of a function of a random variable whose density is known, can be derived using the Dirac delta function as $f_Y(y) = \int dx \delta(y-y(x)) f_X(x)$ and your formula is a consequence of the properties of the Dirac function. The intuition behind the Dirac formulation is clear: you "select" the set of values of $x$ that let $y$ assume a specific value, since the Dirac function is always zero excepts when $y = y(x)$. — nicola, Jun 09 '25 at 12:20

Just a user · Accepted Answer · 2025-06-12T08:05:45.183

This is false without further condition. Let $f_X(x)$ be the indicator function of $[0, 1]$ (i.e. $X\sim U[0, 1]$), and let $C\subset[0,1]$ be a Cantor set of positive measure, and $h'(x) = d(x, C)$. Since $h'(x)\ge 0$ everywhere and $h'(x)\not=0$ is dense, $h(x):=\int_0^x h'(t)dt$ is strictly increasing, hence satisfies all the required conditions.

Therefore, $f_Y(y)=\sum_{h(x)=y}\frac{f_X(x)}{h'(x)}$ is $\infty$ on $h(C)$. If $f_Y$ is the density function of $y$, then the probability of $Y\in h(C)$ must be $0$, but it's actually $|C|>0$.

In other words, at least we need further to assume $h'(X)=0$ has probability $0$. This turned out to be sufficient (and we don't need to have $h^{-1}(y)$ is at most countable as an extra condition). There should be such a version of change of variable formula or some related result about push-forward measure, but I couldn't find any. However, we can just follow the idea of decomposition.

Let's state our result more precisely:

If $h\in C^1(\Bbb R)$, then

$f^{-1}(y)$ is at most countable if $y$ is not a critical value of $h$ (i.e. $f(x)=y\Rightarrow f'(x)\not=0$)

If further $\Bbb P(h'(X)=0)=0$, then

$f_Y=\sum_{h(x)=y}\frac{f_X(x)}{h'(x)}$ is a density function of $Y=h(X)$.

By continuity of $h'$, $\{x: h'(x)\not=0\}$ is open and hence a disjoint union of at most countably many open intervals $(a,b)$ (with $a$ possibly being $-\infty$, and $b$ possibly $\infty$). On each $(a,b)$, since $h'(x)\not=0$, $h(x)$ must be strictly monotone, hence for each $y$, $h^{-1}(y)\cap (a,b)$ has at most one element. Thus $h^{-1}(y)$ is at most countable if $y$ is not a critical value.

Let $B$ be an open subset of $\Bbb R$. We want to show $$\Bbb P(Y\in B)=\int_{h(x)\in B} f_X(x)dx = \int_{y\in B}\sum_{h(x)=y}\frac{f_X(x)}{|h'(x)|}dy$$

If we replace $B$ by $B\setminus\{h(x):h'(x)=0\}$, the second integral won't change, since $\{h(x): h'(x)=0\}$ has Lebesgue measure $0$ by Sard's lemma, and the first integral won't change either, because of $\Bbb P(h(x)\in B)=\Bbb P(h(x)\in B \wedge h'(x)\not=0)$. Hence we may assume $B\cap\{h(x):h'(x)=0\}=\emptyset$.

Again, $f^{-1}(B)$ is a disjoint union of open intervals, on each interval, $h$ is a $C^1$-diffeomorphism, hence we can apply change of variable formula. And the result follows by adding them together over connected components of $f^{-1}(B)$.

How to derive the density formula for $Y=h(X)$ when $h$ is not injective?

1 Answers1