2

Let $K \in \mathbb{N}$ and let $f:(0,1)^K \to \mathbb{R}$ be the function $x \mapsto - 2 \sum_{k=1}^K \sqrt{x_k}$ (that, apart from constants, it is the $1/2$-Tsallis entropy). I'm trying to figure out what is the biggest $\mu>0$ such that $f$ is $\mu$-strongly convex with respect to $\|\cdot\|_1$ (where $\|x\|_1 = \sum_{k=1}^K |x_k|$) on the unit simplex $D := \{ x\in (0,1) ^K \mid \sum_{k=1}^K x_k = 1\}$. Specifically, I'm trying to figure out what is the correct dependence on $\mu := \mu(K)$ w.r.t. to the dimensional parameter $K$ in the relation \begin{equation} \forall x,y \in D, \qquad f(y)\ge f(x)+\mathbb{d}f(x)(y-x)+\frac{\mu}{2} \|y-x\|_1^2. \end{equation}

So far, what I did was noticing that $\mathbb{d}^2f \succeq \frac{1}{2}I$ on $(0,1)^K$ (i.e., $\mathbb{d}^2f(x)-\frac{1}{2} I$ is positive semidefinite for each $x \in (0,1)^K$, a fact that it is easily seen computing the Hessian of $f$) and that $\sqrt{K}\|\cdot\|_2 \ge \|\cdot\|_1$ (where $\|x\|_2 = \sqrt{\sum_{k=1}^K x_k^2}$), which implies (see question 3) that \begin{equation} \forall x,y \in (0,1)^K, \qquad f(y)\ge f(x)+\mathbb{d}f(x)(y-x)+\frac{1/2}{2} \|y-x\|_2^2 \ge f(x)+\mathbb{d}f(x)(y-x)+\frac{1/(2K)}{2} \|y-x\|_1^2 . \end{equation} so, in particular, we can select $\mu = \frac{1}{2K}$.

However, there were several (probably non-sharp) inequalities in between (and the result holds on an even bigger set), so I strongly suspect that the dependence on the dimensional parameter $K$ is sloppy.

Does anyone else have any better ideas?

Bob
  • 5,995

1 Answers1

1

I just worked through this (although I am 3 years late to this post), so I thought I would share what I found. TLDR: yes, you can get much better with $\mu=1/2$. Also note that you defined $D$ not as the unit simplex, but its interior, which is fine for the corresponding discussion below.

The key idea is to avoid using the $\ell_2$ norm and directly work with the $\ell_1$ norm. By Taylor's theorem (with remainder), we want to find the largest $\mu$ such that

$$ \inf_{u \in D} x^T\nabla^2 f(u) x \geq \mu \|x\|^2_1, \ \forall x \in \mathbb{R}^K. $$

Without loss of generality (see also Theorem 3 in this note), we want to find the largest $\mu$ such that

$$ \inf_{u \in D, \|x\|_1=1} x^T\nabla^2 f(u) x \geq \mu. $$

Using the fact the Hessian of $f$ at any point $u \in D$ is a diagonal matrix with diagonal entries $[\nabla^2 f(u)]_{ii} = \frac{1}{2} u_i^{-3/2}$, we have

$$ \begin{align} \inf_{u \in D, \|x\|_1=1} x^T\nabla^2 f(u) x &= \frac{1}{2} \inf_{u \in D, \|x\|_1=1} \sum_{i=1}^K x_i^2 u_i^{-3/2} \\ &= \frac{1}{2} \inf_{u \in D} \min_{x \geq 0} \big\{ \sum_{i=1}^K x_i^2 u_i^{-3/2} : \sum_{i=1}^K x_i = 1\big\} \\ &= \frac{1}{2} \inf_{u \in D} \big( \sum_{i=1}^K u_i^{3/2} \big)^{-1} \\ &= \frac{1}{2} \big( \sup_{u \in D} \sum_{i=1}^K u_i^{3/2} \big)^{-1} \\ &\geq \frac{1}{2} \big( \max_{u \in \Delta_K} \sum_{i=1}^K u_i^{3/2} \big)^{-1} \\ &= \frac{1}{2}, \end{align} $$ where the second line is by using the fact the sign of $x_i$ is irrelevant since only $x_i^2$ appears in the objective, the third line by using the KKT conditions to derive the optimal solution $x_i^* = u_i^{3/2}/(\sum_{j=1}^K u_j^{3/2})$, and the last line is by observing a maximizer to a convex function lies at an extreme point of a probability simplex, denoted by $\Delta_K$ (this can be deduced by applying Jensen's inequality, since any non extreme point is a convex combination of extreme points).

Thus, we have shown $\mu=1/2$ is a valid strongly modulus of convexity wrt the $\ell_1$ norm.

Bob
  • 5,995
Klub
  • 26