6

Holder's inequality and Minkowski's inequality are two basic inequalities about $L^p$ spaces, stating that $||fg||_1 \leq ||f||_p||g||_q$ when $p, q$ are conjugate, and that $||f+g||_p \leq ||f||_p + ||g||_p.$

Holder's inequality is closely related to the fact that $L^p, L^q$ are each others' duals for $p, q$ conjugate (and $1 < p, q < \infty$). However, the only proofs I know of these inequalities are very 'algebra heavy' and I find it hard to think of what the 'ideas' of the proofs are, or how to compress them in my head beyond (for Holder) 'do funky algebra and apply Young's inequality' and (for Minkowski) 'do funky algebra and apply Holder's.'

Is there some more intuitive/conceptual proof of either inequality? Say, some abstract way of seeing the dual of $L^p$ is $L^q$ in a way that allows you to deduce Holder, or some other proof that doesn't use any 'clever' algebraic manipulations (or, perhaps, any motivations for these clever manipulations?).

RobPratt
  • 50,938
user951252
  • 700
  • 4
  • 6

2 Answers2

2

Here is my attempt at expositing these inequalities.

From the concavity of $\ln(x)=:\log(x)$ on $(0,\infty)$, we get the following inequality (analytic big brother to AM-GM --- as opposed to the more algebraic big brothers of AM-GM, the Newton inequalities)

Additive convex combinations $\geq$ multiplicative convex combinations (also known as weighted AM-GM): For $\tfrac 1p+\tfrac 1q=1$ and real numbers $\square, \bigstar >0$, $$\log(\tfrac 1p \square + \tfrac 1q \bigstar)\geq \tfrac 1p \log(\square) + \tfrac 1q \log(\bigstar) = \log(\square^{1/p}\bigstar^{1/q}),$$ i.e. $$\tfrac 1p \square + \tfrac 1q \bigstar \geq \square^{1/p}\bigstar^{1/q}$$ (the additive convex combination, $a=\frac 1p$ additive copies of $\square$ plus $b=\frac 1q$ additive copies of $\bigstar$, is bigger than the same phrase with "multiplicative" in the place of "additive").

Because this comes from the concavity of $\log(x)$, there is equality iff $\square = \bigstar$.

Actually, using Jensen’s inequality, one can prove the even more general weighted QM-AM-GM-HM style inequalities, following what I wrote in this MSE answer here.

Applying this to $\square := |a|^p$ and $\bigstar := |b|^q$, we arrive at

Young's inequality (the analytic big brother to Cauchy-Schwarz "cross correlation bounded by self correlations"): $|ab| \leq \frac 1p |a|^p + \frac 1q|b|^q$ (where we inherit the above equality case: equality iff $|a|^p=|b|^q$)


Applying (i) Young's inequality to functions $f,g$ pointwise and then (ii) integrating, we arrive at $$\|fg\|_1 \leq \frac 1p \Big(\|f\|_p\Big)^p + \frac 1q \Big(\|g\|_q\Big)^q. \tag{1}$$ (where because [$0\leq f \leq g$ but $f\neq g$] $\implies \int f < \int g$, the equality case continues to carry over: equality iff $|f|^p \equiv |g|^q$)

But notice that Young's inequality (applied to $a=\|f\|_p, b=\|g\|_q$) also tells us the RHS shows up in the bound $\|f\|_p \|g\|_q \leq \frac 1p \Big(\|f\|_p\Big)^p + \frac 1q \Big(\|g\|_q\Big)^q$. How do the two LHS's compare?

The key idea is to arbitrage (or "spend") the symmetry $(f,g) \leadsto (\lambda f, \frac 1\lambda g)$ in the inequality (1) (a symmetry for the LHS, but not the RHS! Hence "arbitrage").

By choosing $\lambda$ s.t. $\|\lambda f\|_p^p =\| \frac 1 \lambda g\|_q^q$ (which one can always do, assuming both $f,g \not \equiv 0$ --- in which case Holder's inequality would be trivial anyways), we can "amplify"/"optimize" the RHS of (1) to enter the equality case of Young's inequality, and arrive at Holder's inequality $$\|fg\|_1 \leq \|f\|_p \|g\|_q.$$

(Note that applying this "arbitrage, amplify" trick (so named by Tao) to Young's inequality directly, we do indeed get a very sharp bound, $|ab|\leq |ab|$, which sadly is so sharp that it's useless. It's only because in (1) we had one extra step in between, "(ii) integrate", that we gained something nontrivial.)


To get to Minkowski's $L^p$ triangle inequality, we spend the symmetry of scaling (i.e. homogeneity)

First, we prove the

weak (off by factor of $2$) $L^p$ triangle inequality: we have pointwise $|f+g|\leq 2 \max\{|f|,|g|\} \implies |f+g|^p \leq2^p \max\{|f|^p,|g|^p\}^p = 2^p (|f|^p + |g|^p)$, and so integrating produces $\|f+g\|_p^p \leq 2^p (\|f\|_p^p + \|g\|_p^p)$. Taking $1/p$-th roots, which is concave, hence subadditive (an example of the phenomenon of convex ordering of random variables I explain here), we arrive at $\|f+g\|_p \leq 2(\|f\|_p+\|g\|_p)$.

Motivated by this, we try to find the best possible constant (in place of $2$). One issue is that our strategy, which consists entirely of pointwise bounds, then integrating, then taking $1/p$-th power is more suited to proving $\|f+g\|_p^p \leq C_p (\|f\|_p^p + \|g\|_p^p)$ style bounds, which we then take $1/p$th powers of (which is extremely wasteful; the equality case is if both $f,g$ are identically $0$).

Observe that in the case $\|h\|_p=1$, then we don't need to do this $1/p$-th power step because $1^p=1$. Let us try to use the ability to pull constants out of norms to take advantage of this $1^p=1=1^{1/p}$ fact: define $\tilde f = \frac{f}{\|f\|_p}$ and same for $\tilde g$, so that $f = \|f\|_p \tilde f$ where $\tilde f$ is $L^p$-norm $1$; this will help with the RHS norms. For the LHS, we will spend the symmetry of homogeneity by scaling $f,g$ with the same constant so that now the RHS sum $\|f\|_p+\|g\|_p$ becomes $=1$. With these normalizations, the $L^p$ triangle inequality would follow immediately from integrating the pointwise bound (so indeed we erased the need for the last $1/p$-th power step!) $$|f+g|^p \stackrel{?}{\leq} \|f\|_p |\tilde f|^p + \|g\|_p |\tilde g|^p \iff \Big| \|f\|_p \tilde f + \|g\|_p \tilde g \Big|^p \stackrel{?}{\leq} \|f\|_p |\tilde f|^p + \|g\|_p |\tilde g|^p.$$ But the latter inequality follows immediately from the convexity of $|\bullet|^p$ for $p\geq 1$.


The above is an attempt at expositing Tao's argument, which also appears at this MSE question.

Sadly, I still think it is "too magical". I think the best way of teaching it is to discuss the connection to convexity first, instead of bringing it in as a magic hammer at the very end. For example, this would be a good introduction:

The triangle inequality (in general metric spaces) is at its heart a geometric statement: given a ball $B(x_0, r_0)$, and any ball of radius $r_1$ centered at a point $x_1\in B(x_0,r_0)$, both balls lie entirely within $B(x_0, r_0+r_1)$.

In the context of normed spaces (e.g. in functional analysis, in which convexity does appear naturally), it turns out there is an equivalent geometric characterization (that I didn't realize until now!): the triangle inequality is equivalent to the convexity of balls (or by scaling, equivalent to the convexity of the unit ball). $$\|x+y\|\leq \|x\|+\|y\| \iff \|\frac{x+y}{2}\| = \frac 12 \|x+y\| \leq \frac{\|x\|+\|y\|}2.$$ This is part of a general pattern than general sums can be given geometric (or probabilistic) meaning/intuition if they can be massaged into convex combinations/weighted averages.

It should be no surprise then, that the $L^p$-balls are convex since we've been using convexity non-stop in this entire post (in fact it seems like the only tool we have for these $L^p$ inequalities). Another benefit of convexity arguments is that the equality case pops out very easily (especially if there's strict convexity).

I provide more details:

Theorem: for a vector space $V$ (taking real valued coefficients) with $\|\bullet\| : V \to [0,\infty)$ and $\bar B_1:= \{x\in V: \|x\|\leq 1\}$, TFAE

  1. Triangle inequality $\forall x,y$, $\|x+y\|\leq \|x\|+\|y\|$.
  2. Convexity of unit ball: $x,y\in \bar B_1 \implies$ any convex combination $\lambda\in [0,1]$, $\lambda x + (1-\lambda)y \in \bar B_1$
  3. Ball contains chords: $\|x\|,\|y\|=1 \implies$ any convex combination $\lambda\in [0,1]$, $\lambda x + (1-\lambda)y \in\bar B_1$

(for equality case, can show Triangle Ineq for LI (linearly independent) vectors, Strict Convexity of Unit Ball i.e. $x\neq y \implies $ convex combination in $B_1 \subsetneq \bar B_1$, and Ball Strictly Contains Chords are equivalent.)

Clearly $(1)\implies (2) \implies (3)$. To show $(3) \implies (1)$, we have the following picture: given arbitrary $x,y\in V$, I can use the parallelogram rule to draw $x+y\in V$ (orange below). $(3)$ tells us that all points on the blue line (chord between $\tilde x := \frac{x}{\|x\|}, \tilde y := \frac{y}{\|y\|}$) have norm $\leq 1$:

triangle inequality chord

How do we use the norm information about blue points to deduce something about the norm of the orange $x+y$ point? Well, drawing the green line below, we see $x+y = c \cdot (\lambda\tilde x + (1-\lambda) \tilde y$ for some $\lambda \in [0,1]$ and $c\in [0,\infty)$:

triangle inequality convex combination

Intuitively we can solve for $\lambda, C$ uniquely because may assume $x,y$ are linearly independent (if not, triangle inequality is trivial), and then uniqueness of representation of linear combinations of LI vectors gives unique $\lambda, C$ (this is not necessary, but I point it out anyways). Solving for $\lambda, C$ we arrive at $$\lambda = \frac{\|x\|}{\|x\|+\|y\|}, C = \|x\|+\|y\|,$$ and so the blue information, passed on to the orange point $x+y$ via the green line, becomes exactly the triangle inequality.

Finally, showing $(3)$ for $\|f\|_p$ (no more $1/p$ power!) is easy by pointwise inequalities and convexity of $|x|^p$, as I did above.


Anyways, the way Minkowski's inequality is proved in this aforementioned MSE question can be easily rephrased into the standard proof using Holder's inequality which in turn generalizes straightforwardly to the Holder-inequality proof of Minkowski's integral inequality.


TL;DR all the $L^p$ inequalities follow fairly easily from convexity (+ ideas like arbitraging scaling symmetry)

D.R.
  • 10,556
0

Since seminorms satisfy the triangle inequality, the Minkowski inequality follows from the general statement:

If $A$ is an absolutely convex, absorbing subset of a real or complex vector space $E$, then a seminorm on $E$ is defined by $$||x||=\inf\{t>0\mid \frac{x}{t}\in A\}\quad(x\in E).$$

$E:=L^p$ is a vector space as can be easily proved by the maximum trick as in the part "weak triangle inequality" of @D.R.'s answer. Moreover, convexity of the function $[0,\infty)\to\mathbb{R},t\mapsto t^p$ implies that $A=\{x\in L^p\mid \int |x|^p \le 1\}$ is convex and in turn absolutely convex and absorbing. Hence, the satement above can be applied and yields that $$||x||=\inf\{t>0\mid \int |x/t|^p\le 1\} =\inf\{t>0\mid t^{-p}\int |x|^p\le 1\} =(\int |x|^p)^{1/p}$$ is a seminorm (in fact, a norm) on $L^p$.

tj_
  • 2,408