5

Definition : A function is said to be $\beta$-strongly convex if,

$f(\theta w + (1-\theta) w') \le \theta f(w) + (1-\theta) f(w') - \frac{\beta}{2}\theta(1-\theta)(w-w')^2$

What is the motivation behind the $\frac{\beta}{2}\theta(1-\theta)(w-w')^2$ term?

shyamupa
  • 421

2 Answers2

9

With this definition, $f(w)$ is $\beta$-strongly convex if and only if $g(w)=f(w) - \frac{\beta}{2}\|w\|^2$ is convex. That is, if you expand out the terms you'll see the following expressions are equivalent: $$ \begin{aligned} f(\theta w + (1-\theta) w') &\le \theta f(w) + (1-\theta) f(w') - \frac{\beta}{2}\theta(1-\theta)\|w-w'\|^2 \\ \Leftrightarrow g(\theta w + (1-\theta) w') &\le \theta g(w) + (1-\theta) g(w') \end{aligned} $$ This means $f$ can be written as the sum of convex function $g$ and a quadratic term: $f(w)=g(w)+ \frac{\beta}{2}\|w\|^2$.

Also, a strongly convex function has a unique minimizer. Suppose $w$ and $w'$ both minimize $f$ and $w\ne w'$, then the strong convexity condition implies $f((w+w')/2) < (f(w)+f(w'))/2$, a contradiction. So, in fact we can express $f(w)=h(w)+ \frac{\beta}{2}\|w - w^*\|^2 + f(w^*)$, where $h$ is a nonnegative convex function, and $w^*$ is the unique minimizer of $f$.

p.s.
  • 6,724
  • I liked the explanation in terms of sum of convex function and a quadratic term, but uniqueness of convex function is also guaranteed. Is is that strong convexity is need only to restrict the class of functions as @par pointed out? – shyamupa Dec 18 '13 at 00:22
  • What uniqueness are you referring to? A convex function in general does not have a unique minimizer. For example, a constant function is convex, and every point is a minimizer of it. – p.s. Dec 18 '13 at 05:24
  • Oh, Sorry I mixed up the property that a local minima is a global minima with that of uniqueness. – shyamupa Dec 18 '13 at 07:31
1

First, note that any strongly convex function with $\beta>0$ is convex. You can note this by considering $\theta\in\left(0,1\right)$ and $w\neq w^{\prime}$to get that $$ \frac{\beta}{2}\theta\left(1-\theta\right)\left(w-w^{\prime}\right)>0 $$ and hence \begin{align*} f\left(\theta w+\left(1-\theta\right)w^{\prime}\right) & \leq\theta f\left(w\right)+\left(1-\theta\right)f\left(w^{\prime}\right)-\frac{\beta}{2}\theta\left(1-\theta\right)\left(w-w^{\prime}\right)^{2}\\ & <\theta f\left(w\right)+\left(1-\theta\right)f\left(w^{\prime}\right)-0\\ & =\theta f\left(w\right)+\left(1-\theta\right)f\left(w^{\prime}\right). \end{align*} However, the converse is not necessarily true. Strong convexity is thus a stronger (hence the name) requirement than strict convexity. Whenever you restrict the space of objects you are looking at, you make your life easier, but you will not be able to say something as general if you had not restricted your space. From Wikipedia, ``strongly convex functions are in general easier to work with than convex or strictly convex functions, since they are a smaller class.''

Here, strong convexity is trying to make your life easier by telling you just how strictly convex a function is. The larger the $\beta$, the more ``strictly'' convex your function is.

parsiad
  • 25,738