2

The Wikipedia page mention that the log-sum-exp function, $f(x)=\log \sum_{i=1}^n\exp(x_i)$ is strictly convex when we add $x_0=0$, i.e., $g(x)=\log (1+\sum_{i=1}^n\exp(x_i))$ I want to verify this claim but I'm not sure how to do so.

I can show that $f$ is convex by showing the Hessian is positive semi-definite, but it is not necessary that $g$ is positive definite to have that the Hessian is strictly convex. Indeed, the Hessian of $g$ is not positive definite. (When we take $b^\top Hb$, where $H$ is the Hessian and $b$ has the same element in all entry, $b^\top H b=0$).

I understand that $f$ is affine along the "45-degree line", as explained in this post. But, I'm not sure how to prove that $g$ is indeed convex when we add $0$ and eliminate the possibility that $x_i=x,\forall i$ (except ($x_i = 0$)).

keepfrog
  • 664
  • 1
    The Hessian of $g$ is positive definite everywhere except at the origin. It follows that $g$ is strictly convex. – copper.hat Sep 19 '22 at 02:55

2 Answers2

1

For this answer, I'll be thinking of $f \colon \mathbf{R}^{n+1} \rightarrow \mathbf{R}$ and $g \colon \mathbf{R}^{n} \rightarrow \mathbf{R}$ so that we can consider $g$ to be $f$, just with the first argument fixed at $0$, ie $g(x) = f(0, x)$.

So the thing to note here is that the statement "$f$ is convex" is almost exacly the same as the Hölder inequality for sequence spaces. In particular, if $\lambda \in (0, 1)$, then we can take $p = 1/\lambda, q = 1/(1 - \lambda)$ and see that, for $x, y \in \mathbf{R}^{n+1}$, \begin{align*} f(\lambda x + (1 - \lambda) y) &= \log\Bigl[\sum_{j=0}^n e^{\lambda x_j} \cdot e^{(1 - \lambda)y_j}\Bigr] \\ &\leq \log\Bigl[\Bigl(\sum_{j=0}^n e^{\lambda p x_j}\Bigr)^{1/p} \cdot \Bigl(\sum_{j=0}^n e^{(1 - \lambda)qy_j}\Bigr)^{1/q}\Bigr] \\ &= \log\Bigl[\Bigl(\sum_{j=0}^n e^x_j\Bigr)^{\lambda} \cdot \Bigl(\sum_{j=0}^n e^{y_j}\Bigr)^{1 - \lambda}\Bigr] \\ &= \lambda f(x) + (1 - \lambda) f(y). \end{align*}

Moreover, we know from facts about the Hölder inequality that equality holds if and only if $(e^{x_i})_i$ and $(e^{y_i})_i$ are linearly dependent. That is, that there is a $A > 0$ such that, for all $i$, \begin{align*} e^{x_i} &= A e^{y_i}, \\ x_i - y_i &= \log A. \end{align*}

This is exacly the "45 degree line" along which $f$ is affine. But note that in the case of $g$, $x_0 = y_0$, which would require $A= 1$ and so that $x = y$. Hence, for any $x' \neq y' \in \mathbf{R}^n$ the corresponding equality never occurs, and we must have that $$ g(\lambda x' + (1 - \lambda) y') < \lambda g(x') + (1 - \lambda)g(y'), $$ rendering $g$ striclty convex.

0

If $x_0$ is assumed to be a constant, then @copper.hat's comment is correct--the Hessian of $g$ is in fact positive definite, which implies that adding the constant $x_0$ makes the log-sum-exp function strictly convex.

Suppose $x_0 = c$. The gradient of $g$ is

$$ \frac{\partial}{\partial x}g(x) = \begin{bmatrix} \frac{e^{x_1}}{e^c + \sum_i e^{x_i}} & \cdots & \frac{e^{x_n}}{e^c+\sum_i e^{x_i}} \end{bmatrix} $$

Since $x_0 = c$ is a constant, note that the gradient has only $n$ entries corresponding to $x_1,\ldots,x_n$. Also, observe that $\frac{\partial g}{\partial x}^\intercal \vec{1} \neq 1$.

For brevity and to match notation from other sources (e.g. Boyd and Vandenberghe, p. 74), define $z = \begin{bmatrix}e^{x_1} & \cdots & e^{x_n} \end{bmatrix}$. The gradient can then be rewritten as

$$ \frac{\partial}{\partial x}g(x) = \left(\frac{1}{e^c + \vec{1}^\intercal z}\right) z. $$

With a bit of derivation, the Hessian of $g$ is then

$$ \begin{align} \frac{\partial^2}{\partial x^2}g(x) &= \frac{1}{\left(e^c + \vec{1}^\intercal z \right)^2}\left(\left(e^c + \vec{1}^\intercal z \right)\textbf{diag}(z) - zz^\intercal \right) \\ &= \frac{1}{\left(e^c + \vec{1}^\intercal z \right)^2}\left(e^c \textbf{diag}(z) + \left(\vec{1}^\intercal z\ \textbf{diag}(z) - zz^\intercal \right) \right) \end{align} $$

Taking a closer look, we see these properties:

  • The matrix $\vec{1}^\intercal z\ \textbf{diag}(z) - zz^\intercal$ is symmetric and positive semidefinite. This can be seen by comparing with the Hessian of the original log-sum-exp function: $\frac{\partial f(x)}{\partial x} = \frac{1}{\vec{1}^\intercal z}\left( \vec{1}^\intercal z\ \textbf{diag}(z) - zz^\intercal \right)$.
  • The matrix $e^c \textbf{diag}(z)$ is a diagonal matrix with strictly positive entries, and is therefore positive definite.

The sum of positive definite and positive semidefinite matrices is positive definite, and therefore the Hessian of $g$ is positive definite.