4

Well If I've got a nice understanding, this proof involves the use of Rolle's and/or Mean Value Theorems. But I don't even an idea of how to start. We already know that $$f\text{ concave}\Rightarrow f''<0$$ but, then...

Marcelo
  • 1,462

4 Answers4

17

$$ \frac{b}{a+b}f(0) + \frac{a}{a+b}f(a+b) \leq f(a) $$

since $f(0) \ge 0$ we have

$$ \frac{a}{a+b}f(a+b) \leq f(a) $$ similarly by interchanging the role $a$ with $b$ we have

$$ \frac{b}{a+b}f(a+b) \leq f(b) $$

Now by adding last two inequality we arrive to the result.

Red shoes
  • 7,184
3

The answer of Red Shoes is correct and succinct, but I find worthwhile to share the following more geometric proof.

Fix $0\leq x\leq y$ and let $g$ be the function that interpolates $f$ at $x,y$ if they are distinct, otherwise let $g$ be a tangent to $f$ at $x=y$ (this notion is well defined though not necessarily uniquely so, even if $f$ is not differentiable, merely by concavity). By concavity of $f$, $g\geq f$ on $[0,x] \cup [y,\infty)$, thus in particular, $g(0)\geq f(0)\geq0$ and $g(x+y) \geq f(x+y)$. As a result, $$ f(x+y) \leq g(x+y) = g(x) + g(y) - g(0) \leq g(x)+g(y) = f(x)+f(y). $$

One of the merits of such approach is that the statement only needs to hold for affine functions $f\colon\mathbb R_+ \to\mathbb R_+$ to apply to all concave functions $f\colon\mathbb R_+ \to\mathbb R_+$. For instance, you could use this argument to show that if $f\colon\mathbb R_+ \to\mathbb R_+$ is concave, then $g\colon t\in\mathbb R_+ \longmapsto \sqrt{t f(t)}$ is subadditive, which might not be as immediate otherwise.

We can make this claim formal with the proof below.

We can first show that the statement holds for all such affine functions $f$, i.e., that for all $a,b,x,y\geq0$, $$ > \sqrt{(x+y)(a(x+y)+b)} \leq \sqrt{x(ax+b)} + \sqrt{y(ay+b)}. $$ Squaring both sides and rearranging, the above is equivalent to $$ > a\sqrt{xy} \leq \sqrt{(ax+b)(ay+b)}, $$ which evidently holds. Secondly, we use this fact and the previous kind of argument. As before, fix $0\leq x\leq y$ and let $h$ be the function that interpolates $f$ at $x,y$ if they are distinct, otherwise let $h$ be a tangent to $f$ at $x=y$. Again, by concavity of $f$, $h\geq f$ on $[0,x] \cup > [y,\infty)$, thus in particular, $h(0)\geq f(0)\geq0$ and $h(x+y) \geq f(x+y)$. As a result, $$ \sqrt{(x+y)f(x+y)} \leq \sqrt{(x+y)h(x+y)} \leq \sqrt{xh(x)} + \sqrt{yh(y)} = \sqrt{xf(x)} + \sqrt{yf(y)}. $$

Cryme
  • 652
  • 1
    nice answer, but could you explain the point/example of your last paragraph a little more? I just didn't quite catch your meaning – D.R. Mar 01 '25 at 05:23
  • @D.R. Most certainly, I edited my answer, please have a look. – Cryme Mar 03 '25 at 22:18
  • 1
    thank you! now I'm wondering if $f$ concave forces $t\mapsto \sqrt {t f(t)}$ to be concave... $f$ concave $\iff$ every $t_0$ we have $f(t)\leq l(t)$ for $l$ some line where $f(t_0)=l(t_0)$ "touches at $t_0$", but then $\sqrt{t f(t)} \leq \sqrt{t l(t)}$ touches at $t_0$ and $\sqrt{t l(t)}$ is always concave (by taking 2nd derivative), so there's another line $L(t) \geq \sqrt{tl(t)} \geq \sqrt{t f(t)}$, everyone touching at $t_0$. This holds at all $t_0$, hence $\sqrt{tf(t)}$ is concave, and hence subadditive as you've shown. But yes, interesting idea/method to prove subadditive! – D.R. Mar 03 '25 at 22:44
  • @D.R. Sounds right to me! – Cryme Mar 06 '25 at 20:21
2

We may suppose $a \leq b$. First note that $a=\frac a b b+(1-\frac a b) 0$. Hence $f(a) \geq \frac a b f(b)$. Next write b as $\alpha a+(1-\alpha)(a+b)$ and apply the definition of concavity. We get $f(a+b) \leq \frac b {b-a} f(b) - \frac a {b-a} f(a) \leq f(a)+f(b)$ because $\frac a {b-a} f(b) \leq \frac b {b-a} f(a)$

Mittens
  • 46,352
1

@Cryme's answer is very nice, and I wanted to situate those ideas into a larger context.

There's a general principle, that convex functions are smaller "than expected" in the "middle", or concave functions are bigger "than expected" in the "middle". To understand this better, I will discuss what I mean by "middle" (middle of what?), and "smaller/bigger than expected".

The inequality that @Cryme proves is: for $f$ a concave function, $$f(0)+f(x+y) \leq f(x)+f(y).$$ Or in other words, if $Z$ is the random variable which takes values $0, (x+y)$ with probability $\frac 12$ each, and $A$ is a random variable which takes values $x,y$ with probability $\frac 12$ each (note $\mathbb EA = \frac{x+y}2= \mathbb E Z$), then $f$ concave $\implies \mathbb E f(A) \geq \mathbb Ef(Z)$.

Intuitively, if I have two random variables $A,Z$ with the same mean $\mu$, and $A$ is distributed "more tightly" around $\mu$, then $f$ concave $\implies \mathbb E f(A) \geq \mathbb Ef(Z)$.

I.e. because concave $f$ "bulges up in the middle" and $A$ is more tightly clustered around $\mu$, then we expect $f$ to be larger "on $A$" than "on $Z$" (this is what I mean by "concave functions are bigger in the middle").


The special case where $Z$ takes values $a,b$ with prob. $p, 1-p$ (so mean $\mu=pa+(1-p)b$), and $A$ takes values in $[a,b]$ with mean $\mu$, and $f$ concave, we have $$\mathbb Ef(Z) \leq \mathbb E f(A)$$ because letting $l(x)$ be the line passing through the points $(a,f(a)), (b,f(b))$, concavity implies $l(x)\leq f(x)$ on $[a,b]$, and so $$\mathbb Ef(Z) = \mathbb E l(Z) = l(\mathbb EZ) = l(\mu) = l(\mathbb E A)= \mathbb E l(A) \leq \mathbb E f(A),$$ where we use crucially the linearity of expectation to commute $\mathbb E$ and the linear function $l$.

(The above case covers @Cryme's inequality, and hence proves the inequality in the title of this MSE question)


Another special case is when $A$ is the constant random variable $A \equiv \mu$. In this case, we get precisely Jensen's inequality!


In general, this is the theory of convex ordering for random variables.

Definition: the convex ordering relation between (real-valued) random variables is $X \leq_\text{cx} Y :\iff \mathbb E\phi(X) \leq \mathbb E\phi(Y)$ for every convex function $\phi:(?) \to \mathbb R$ (defined at least on the ranges of $X,Y$). (Note that in this case, necessarily the means must be the same $\mathbb EX = \mathbb EY$)

See e.g. this paper (CONVEX ORDERING PROPERTIES AND APPLICATIONS), which references the book Stochastic Orders (Shaked and Shanthikumar). Theorem 3.A.1 or 3.A.2 in that book (or Thm. 5 in the paper) says

Theorem: TFAE

  • $X\leq_\text{cx} Y$
  • same as convex order but restricted to non-decreasing (i.e. weakly increasing) convex functions instead of general convex functions
  • $\mathbb E\max\{X,c\}\leq \mathbb E\{Y,c\}$ for every $c\in \mathbb R$.
  • $\int_c^\infty \mathbb P(X>t) dt \leq \int_c^\infty \mathbb P(Y>t) dt$ for all $c\in \mathbb R$
  • $\mathbb E |X-c| \leq \mathbb E |Y-c|$ for all $c\in \mathbb R$
  • (Thm 3.A.3.) if $X,Y$ mean $0$, then $X \leq_\text{cx} Y$ if, and only if, for a standard Brownian motion from $0$, $\{B(t), t ≥ 0\}$, there exist two stopping times $T_1$ and $T_2$, such that $T_1\leq T_2$ almost surely, and $X$ has the same distribution as $B(T_1)$, as does $Y$ with $B(T_2)$. (This I think captures the intuition that $Y$ should indeed be "more spread/diffused" than $X$)

There's not too many results when I search "convex order" online, but here's one other paper that shows up (Convex Ordering of Random Variables and its Applications in Econometrics and Actuarial Science). And also I felt a connection to optimal transport, and found this and this

This blogpost mentions a lemma of Olin 1969 (also discussed here, just search "Ohlin convex order" for more) providing a sufficient condition:

Lemma (Ohlin 1969): for random variables $X,Y$ with mean $\mu$; if there exists $x_0 \in \mathbb R$ s.t. $$x <x_0 \implies F_X(x)\leq F_Y(x)\quad \text{ and } \quad x>x_0 \implies F_X(x)\geq F_Y(x)$$ (i.e. "the CDF's of $X,Y$ cross exactly once" --- the means being equal means there is at least one crossing; though be warned $x_0$ may not be $=\mu$), then $X \leq_\text{cx} Y$.

Or equivalently, this is saying that for every interval $I$ around $t_0$, $$\mathbb P(X\in I) \geq \mathbb P(Y\in I), \tag{1}$$ or "$X$ is more uniformly concentrated around $t_0$ than $Y$" ("uniform" to reflect "for every interval $I$")

The proof is as follows: because $\mathbb El(X) = l(\mathbb EX) = l(\mu) = l(\mathbb EY) = \mathbb El(Y)$ for any linear function (as I used above), proving the inequality $\mathbb E f(Y) \geq \mathbb E f(X)$ is equivalent to proving it with $f$ replaced by any $f-l$. In particular, we choose a line $l(x)$ supporting the convex function $f$ at $t_0$, to get $g(t):=f(t)-l(t)\geq0$ which (weakly) decreases down to $0$ at $t_0$, and then (weakly) increases.

The layer-cake formula says $\mathbb Eg(X) = \int_0^\infty \mathbb P(g(X)>\lambda )d\lambda $, so $$\mathbb E[ g(Y)-g(X)] = \int_0^\infty \mathbb P(g(Y)>\lambda) - P(g(X)>\lambda) d\lambda = \int_0^\infty \mathbb P(g(X)\leq \lambda) - \mathbb P(g(Y)\leq \lambda) d \lambda$$ $\mathbb P(g(X)\leq \lambda) = \mathbb P(X\in g^{-1}(\leq \lambda))$ where $I:= g^{-1}(\leq \lambda)$ is an interval containing $t_0$. Using the hypothesis $(1)$, we can lower bound the above displayed equality by $\geq 0$, as desired.


Ohlin's sufficient condition is enough to prove both @Cryme's inequality, and Jensen's inequality.

D.R.
  • 10,556