8

I came across the following interesting and important result:

Let $f$ be a proper convex function and $\bar{x}$ be an interior point of ${\rm dom} f$. Denote the sublevel set $\{x:f(x)\leq f(\bar{x})\}$ by $C$ and the normal cone to $C$ at $\bar{x}$ by $N_C(\bar{x})$. Moreover, assume that $f(\bar{x})>\inf f(x)$. Then, we have $N_C(\bar{x})={\rm cone}\, \partial f(\bar{x})$.

Many textbooks have this result as a theorem. But most of them proved this result by citing many other results. The basic idea is to show $(T_C(\bar{x}))^{\circ}={\rm cone}\,\partial f(\bar{x})$, where $T_C(\bar{x})$ is the tangent cone to $C$ at $\bar{x}$.

I tried to show this result by a naive approach, i.e., by showing $N_C(\bar{x})\supset{\rm cone}\,\partial f(\bar{x})$ and $N_C(\bar{x})\subset{\rm cone}\,\partial f(\bar{x})$. Indeed, it is trivial to show $N_C(\bar{x})\supset{\rm cone}\,\partial f(\bar{x})$. The difficulty comes from the other direction.

I am wondering if there is any simple/insightful way to show this result.

Thanks.

mining
  • 1,157
  • 1
  • 10
  • 18
  • 1
    Out of curiosity, where did you come across this? – littleO Sep 15 '14 at 07:06
  • 2
    This is indeed an exercise in Borwein's book: Convex analysis and nonlinear optimization. Although it is an exercise, I think it is very useful to relate many stuff together. As I mentioned, it appears in many other textbooks as a theorem in a bit different form. – mining Sep 15 '14 at 08:47

3 Answers3

2

Proof of $R_+ \partial f(\bar{x})\supset N _C(\bar{x})$ (by contradiction) :

(1). $\bar{x} \in core(dom f)$ $\Rightarrow$ $\partial f(\bar{x})$ is bounded.

(2). $f(\bar{x}) > \inf f$ $\Rightarrow$ $0 \notin \partial f(\bar{x})$.

Assume that $R_+ \partial f(\bar{x}) \supset N _C(\bar(x))$ doesn't hold. Then there exists nonzero $d$ such that $d \in N _C(\bar{x})$ and $d \not\in R_+ \partial f(\bar{x})$. Let $D=R_+\{d\} $. Then $\partial f(\bar{x}) \bigcap D=\emptyset$.

From (1), (2) and hyperplane separation theorem, there exists $a$ such that $\langle a, d \rangle \geq 0$ and $\langle a, \phi\rangle < -\epsilon$ for any $\phi \in \partial f(\bar{x}) $, where $\epsilon>0$.

From the max formula, $f'(\bar{x},a)<0$. Thus, $a$ is a descent direction for $f$, we have $\langle a, d\rangle <0$, which leads to a contradiction.

Pew
  • 683
  • 5
  • 24
Zhang
  • 21
  • Can you state explicitly the particular hyperplane separation theorem you’re using here? (A reference would be great too if you have one.) – littleO Oct 22 '22 at 19:38
1

Here's my attempt to explain a proof that I learned from the book Fundamentals of Convex Analysis by Hiriart-Urruty and Lemarechal.

Some background knowledge and notation:

I'll first review some relevant background knowledge from convex analysis.

  • Let $K \subset \mathbb R^n$ be a convex cone. The "polar" of $K$ is the cone $K^\circ$ defined by $$ K^\circ = \{ w \in \mathbb R^n \mid \langle v, w \rangle \leq 0 \text{ for all } v \in K\}. $$ It can be shown that if $K$ is a closed convex cone, then $$ (K^\circ)^\circ = K. $$

  • Suppose that $C \subset \mathbb R^n$ is a convex set and $\bar x \in C$. A "feasible vector" or "feasible direction" for $C$ at $\bar x$ is a vector $v \in \mathbb R^n$ such that $\bar x + t v \in C$ for some $t > 0$. (So if you start at the location $\bar x$ and move a short distance in the direction $v$, you will not leave $C$.) The set of all feasible vectors for $C$ at $\bar x$ is denoted $F_C(\bar x)$. The tangent cone to $C$ at $\bar x$, denoted $T_C(\bar x)$, is by definition the closure of $F_C(\bar x)$: $$ T_C(\bar x) = \overline{F_C(\bar x)}. $$ (This definition of the tangent cone is only correct when $C$ is convex.)

  • The normal cone to $C$ at $\bar x$ is the set $N_C(\bar x)$ defined by $$ N_C(\bar x) = \{ w \in \mathbb R^n \mid \langle x - \bar x, w \rangle \leq 0 \text{ for all } x \in C \}. $$ It can be shown that $T_C(\bar x)$ and $N_C(\bar x)$ are both closed convex cones and that $$ N_C(\bar x) = T_C(\bar x)^\circ. $$

  • Suppose that $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is a convex function and $\bar x \in \mathbb R^n$. A "subgradient" of $f$ at $\bar x$ is a vector $g \in \mathbb R^n$ such that $$ f(x) \geq f(\bar x) + \langle g, x - \bar x \rangle \quad \text{for all } x \in \mathbb R^n. $$ The set of all subgradients of $f$ at $\bar x$ is called the "subdifferential" of $f$ at $\bar x$ and is denoted $\partial f(\bar x)$.

  • The "effective domain" of a function $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is the set $\textbf{dom} \, f$ defined by $$ \textbf{dom} \, f = \{ x \in \mathbb R^n \mid f(x) < \infty \}. $$ It can be shown that if $f$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then $\partial f(\bar x)$ is a non-empty, convex, compact set.

  • It can also be shown that if $f:\mathbb R^n \to \mathbb R \cup \{\infty \}$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then $$ D_v f(\bar x) = \max_{g \in \partial f(\bar x)} \langle g, v \rangle. $$ Here $$ D_v f(\bar x) = \lim_{t \to 0^+} \frac{f(\bar x + tv) - f(\bar x)}{t} $$ is the directional derivative of $f$ at $\bar x$ in the direction $v$.

  • If $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then the function $$ h(v) = D_v f(\bar x) $$ is a finite sublinear function. It follows that $h$ is continuous.


Statement of theorem:

Let $f:\mathbb R^n \to \mathbb R \cup \{ \infty \}$ be a proper convex function. Suppose that $\bar x$ belongs to the interior of the effective domain of $f$ and that $\inf_{x \in \mathbb R^n} f(x) < f(\bar x)$. If $$ C = \{ x \in \mathbb R^n \mid f(x) \leq f(\bar x) \} $$ then $$ N_C(\bar x) = \mathbb R^+ \partial f(\bar x) = \{ t g \mid t \geq 0, g \in \partial f(\bar x) \}. $$ Here $N_C(\bar x)$ is the normal cone to $C$ at $\bar x$.


Proof strategy:

We can prove this theorem in the following steps.

  1. We'll first show that the tangent cone to $C$ at $\bar x$ is $$ \tag{$\spadesuit$} T_C(\bar x) = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0\}. $$

  2. Because $\bar x$ is in the interior of the effective domain of $f$, we have that $\partial f(\bar x)$ is nonempty and compact and $$ D_v f(\bar x) = \max_{g \in \partial f(\bar x)} \langle g, v \rangle \qquad \text{for any } v \in \mathbb R^n. $$ We notice that \begin{align} \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \} &= \{ v \in \mathbb R^n \mid \langle g, v \rangle \leq 0 \text{ for all } g \in \partial f(\bar x) \} \\ &= \{ v \in \mathbb R^n \mid \langle tg, v \rangle \leq 0 \text{ for all } g \in \partial f(\bar x), t \geq 0 \} \\ \tag{$\clubsuit$}&= \left( \mathbb R^+ \partial f(\bar x) \right)^\circ. \end{align} Here $$ \mathbb R^+ \partial f(\bar x) = \{ t g \mid g \in \partial g(\bar x) \} $$ is the convex cone generated by $\partial f(\bar x)$. Because $\partial f(\bar x)$ is compact, the set $\mathbb R^+ \partial f(\bar x)$ is in fact a closed convex cone.

  3. Putting the above pieces together, we see that \begin{align} N_C(\bar x) &= T_C(\bar x)^\circ \\ &= \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}^\circ \\ &= \left(\mathbb R^+ \partial f(\bar x) \right)^{\circ \circ} \\ &= \mathbb R^+ \partial f(\bar x). \end{align} This is what we wanted to show.


Proof details: Steps 2 and 3 above are already complete. We only need to provide details for step 1.

Equation ($\spadesuit$) seems plausible. Certainly if $v$ is a feasible direction for $C$ at $\bar x$ then $D_vf(\bar x) \leq 0$. Indeed, if $f(\bar x + tv) \leq f(\bar x)$ for all sufficiently small positive $t$, then $$ D_v f(\bar x) = \lim_{t \to 0^+} \frac{f(\bar x + tv) - f(\bar x)}{t} \leq 0. $$ Because the function $v \mapsto D_v f(\bar x)$ is continuous, it follows that if $v$ is a limit of feasible directions then $D_v f(\bar x) \leq 0$. Thus, $$ \tag{1} T_C(\bar x) \subset \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0\}. $$ The only challenge is to prove the reverse inclusion.

Let $$ S = \{ v \in \mathbb R^n \mid D_v f(\bar x) < 0 \}. $$ It is clear that $$ S \subset F_C(\bar x) $$ because if $D_v f(\bar x) < 0$ then moving a short distance away from $\bar x$ in the direction $v$ will reduce the value of $f$. Thus, $$ \bar S \subset \overline{F_C(\bar x)} = T_C(\bar x). $$ But it can be shown that $$ \tag{2} \bar S = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}. $$ It follows that $$ \tag{3} \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \} \subset T_C(\bar x). $$ The results (1) and (3) together show that $T_C(\bar x) = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}$, which is what we wanted to show.


The only missing detail in the above proof is to establish equation (2). Let $h:\mathbb R^n \to \mathbb R$ be the function defined by $$ h(v) = D_v f(\bar x). $$ As mentioned in the "background knowledge" section above, this function $h$ is finite and sublinear, hence convex. We are given that there exists a point $x^* \in \mathbb R^n$ such that $f(x^*) < f(\bar x)$. It follows that $h(v^*) < 0$, where $v^* = x^* - \bar x$. Equation (2) now follows from the following lemma.

Lemma: Suppose that $h:\mathbb R^n \to \mathbb R$ is convex and that $h(v^*) < 0$ for some vector $v^* \in \mathbb R^n$. Then $$ \tag{4} \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}} = \{ v \in \mathbb R^n \mid h(v) \leq 0 \}. $$

Proof of lemma: Any convex function $h:\mathbb R^n \to \mathbb R$ is continuous. Hence, the set $\{ v \in \mathbb R^n \mid h(v) \leq 0 \}$ is closed. It follows that $$ \tag{5} \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}} \subset \{ v \in \mathbb R^n \mid h(v) \leq 0 \}. $$

Is the reverse inclusion true also? Well, suppose that $v \in \mathbb R^n$ and $h(v) \leq 0$. We are given that there exists a vector $v^* \in \mathbb R^n$ such that $$ h(v^*) < 0. $$ Let $$ v_k = \frac{1}{k} v^* + \frac{(k-1)}{k} v \quad \text{for each positive integer }k $$ and note that $v_k \to v$ as $k \to \infty$. Because $h$ is convex, we have that $$ h(v_k) \leq \frac{1}{k} \underbrace{h(v^*)}_{\substack{\big \uparrow\\ < \, 0}} + \frac{(k-1)}{k} \underbrace{h(v)}_{\substack{\big \uparrow \\ \leq \, 0}} < 0 $$ for each positive integer $k$. So $v$ is a limit of vectors $v_k$ which satisfy $h(v_k) < 0$. This shows that $$ \tag{6} \{ v \in \mathbb R^n \mid h(v) \leq 0 \} \subset \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}}. $$ Equations (5) and (6) together show that (4) holds, so the lemma is proved.

littleO
  • 54,048
1

Here is one possibility to show the reverse direction and the only used fact is \begin{equation*} f'(\bar x; d ) = \sup_{\lambda \in \partial f(\bar x)} \langle \lambda, d \rangle . \end{equation*} The desired inclusion $N_C(\bar x) \subset \mathrm{cone}\partial f(\bar x)$ is equivalent to $N_C(\bar x)^\circ \supset \mathrm{cone}\partial f(\bar x)^\circ$, that is, we have to show \begin{equation*} T_C(\bar x) \supset \partial f(x)^\circ, \end{equation*} since $\partial f(\bar x)^\circ \supset \mathrm{cone}\partial f(\bar x)^\circ$. Let $d \in \partial f(\bar x)^\circ$ be given. Hence, for all $\lambda \in \partial f(\bar x)$, we have \begin{equation*} \langle \lambda, d \rangle \le 0 \end{equation*} and this means \begin{equation*} f'(\bar x;d) \le 0. \end{equation*} By using a point $\tilde x$ with $f(\tilde x) < f(\bar x)$, it is easy to show \begin{equation*} f\Big( (1-\lambda_n) \, (\bar x + n^{-1} \, d) + \lambda_n \, \tilde x \Big) \le f(\bar x), \end{equation*} with $\lambda_n = \mathcal{o}(1/n)$, which yields $d \in T_C(\bar x)$.

This proof is also insightful. It just means that if you have a direction $d$, which is polar to all subgradients, it has to be a descent direction (i.e. $f'(\bar x;d) \le 0$) and thus lies in the tangent cone of $C$.

gerw
  • 33,373
  • Hi, gerw: could you please provide more detail about the last inequality? By the way, I think you also make use of the fact that ${\rm cone},\partial f(\bar{x})$ is closed. Thus, it is the same as $({\rm cone},\partial f(\bar{x}))^{\circ\circ}$. This is due to $0\notin\partial f(\bar{x})$. Am I right? Thanks a lot. – mining Sep 15 '14 at 08:33
  • Moreover, I do not think a direction polar to all subgradients has to be a descent direction. Consider $f(x,y)=\frac{1}{2}(x^2+y^2)$ at $(1,0)$. The gradient is $(1,0)$. The direction $(0,1)$ is polar to $(1,0)$ and belong to the tangent cone. But it is indeed an ascent direction. Please let me know if I miss anything. Thanks. – mining Sep 15 '14 at 08:44
  • For this inequality, I only used the definition of the directional derivative and the convexity of $f$. I did not used that $\mathrm{cone}\partial f(\bar x)$ is closed, but only $\mathrm{cone} A^\circ \supset A^\circ$, see edit. – gerw Sep 15 '14 at 09:05
  • Without the assumption that ${\rm cone},\partial f(\bar{x})$ is closed, why is the inclusion $N_C(\bar{x})\subset{\rm cone},\partial f(\bar{x})$ equivalent to $(N_C(\bar{x}))^{\circ}\supset({\rm cone},\partial f(\bar{x}))^{\circ}$? – mining Sep 15 '14 at 20:43
  • Oh, yes, you are right. – gerw Sep 16 '14 at 06:16
  • @gerw Based on your last comment, can you please tell me what is the status of the proof you’ve written here? Is it correct as written? – littleO Oct 19 '22 at 02:02
  • 1
    My proof shows that $N_C(\bar x) \subset \operatorname{cl,cone}(\partial f(\bar x))$. However, since $\partial f(\bar x)$ is a compact subset of $\mathbb R^n$ and $0 \not\in \partial f(\bar x)$, one can show that $\operatorname{cone}(\partial f(\bar x))$ is already closed. – gerw Oct 19 '22 at 06:40