10

$\newcommand{\prox}{\operatorname{prox}}$ Probably the most remarkable property of the proximal operator is the fixed point property:

The point $x^*$ minimizes $f$ if and only if $x^* = \prox_f(x^*) $

So, indeed, $f$ can be minimized by find a fixed point of its proximal operator. See Proximal Algorithms by Neal Parikh and Stephen Boyd.

Question 1. In the paper given above, author is saying:

If $\prox_f$ were a contraction, i.e., Lipschitz continuous with constant less than $1$, repeatedly applying $\prox_f$ would find a (here, unique) fixed point

Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?

Question 2. Let me quote a paragraph from the same paper:

It turns out that while $\prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:

$\|\prox_f(x) - \prox_f(y)\|^2_2 \le (x-y)^T (\prox_f(x) - \prox_f(y))$

$\forall x,y \in \mathbb{R}^n$

Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-\alpha)I + \alpha N$, where $\alpha \in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:

$x^{k+1} := (1-\alpha)x^k +\alpha N(x^k)$

will converge to a fixed point of $N$. Put differently, damped iteration of a nonexpansive operator will converge to one of its fixed points.

Operators in the form $(1-\alpha)I + \alpha N$, where $N$ is non-expansive and $\alpha \in (0,1)$, are called $\alpha$-averaged operators.

Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged operators.

Why "unless $f$ is strongly convex"?

What is the intuition behind the given expression for firm nonexpansiveness?

How can you show that firm nonexpansive operators are $\alpha$-averaged with $\alpha = \frac{1}{2}$?

Is anyone aware of the proof of why proximal map is firm nonexpansive?

Royi
  • 10,050
trembik
  • 1,309

4 Answers4

15

Here's a short proof that proximal operators are firmly nonexpansive. Let $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ be closed convex proper (CCP). Assume that \begin{equation} u_1 = \operatorname{prox}_f(x_1) \quad \text{and} \quad u_2 = \operatorname{prox}_f(x_2). \end{equation} Equivalently, \begin{equation} \tag{$\spadesuit$} x_1 - u_1 \in \partial f(u_1) \quad \text{and} \quad x_2 - u_2 \in \partial f(u_2). \end{equation} Now we use the (fundamentally important) fact that $\partial f$ is a monotone mapping, which tells us that \begin{align} &\langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 \rangle \geq 0 \\ \tag{$\heartsuit$}\implies & \langle x_1 - x_2, u_1 - u_2 \rangle \geq \| u_1 - u_2 \|_2^2. \end{align} This shows that $\operatorname{prox}_f$ is firmly nonexpansive.

Comments:

  • When working with prox-operators, often the very first step is to express $u = \operatorname{prox}_f(x)$ in the equivalent form $x - u \in \partial f(u)$. This is a statement you can work with, involving more primitive notions.
  • The fact that $\partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(\spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $\partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $\partial f$ is that it is monotone.
  • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(\heartsuit)$ rather quickly. A mathematician would notice that inequality $(\heartsuit)$ implies that $\operatorname{prox}_f$ is nonexpansive (because $\langle x_1 - x_2, u_1 - u_2 \rangle \leq \|x_1 - x_2\|_1 \| u_1 - u_2 \|_2$), but it may seem wasteful to replace the inequality $(\heartsuit)$ with the inequality $\|u_1 - u_2 \|_2 \leq \|x_1 - x_2 \|_2$, because this latter inequality is less tight.
littleO
  • 54,048
2

For the first one it is enough to assume it is lipshitz continous with constant $1$ since $prox_f$ is by definition a self-mapping to the domain of $f$. Thats the first condition needed for the Banach-Fixpoint-Theorem. Now it also is contractive since its lipshitz continuous with constant less than 1 so we can use the Theorem to get a fixpoint iterativly.

  • Welcome to MSE !! . This is not an answer , rather it should be posted as a comment . – Anonymous Aug 09 '20 at 10:33
  • unfortunatley i cant comment yet but indeed i would ve postet it as a comment assuming that this would be prefered by the platform – crush3dice Aug 09 '20 at 15:17
  • Then you have to first gain some reputation either by posting your doubts on questions or by answering questions , so that you can get the privilage to comment . – Anonymous Aug 09 '20 at 15:18
  • isnt that what i am aiming to do here? I just completed the anwer of Chuan, explaining why one only needs to assume one of the two necessary components of a contraction in the Banach sence. – crush3dice Aug 09 '20 at 15:23
  • edit your answer a bit, maybe just add a full stop so i can take back my downvote – Arjun Aug 09 '20 at 17:47
  • made it a bit longer. Hope its good to you now. Anyway since i have 1 reputation i cant lose much :D – crush3dice Aug 09 '20 at 17:58
  • 1
    Welcome to MSE, i ma trying to make it more user friendly especially for new users. When i reviewed your answer i somehow missed that you cannot comment yet. Hope you enjoy the gift – Arjun Aug 09 '20 at 18:17
  • thanks. Now i finally have rights :O – crush3dice Aug 25 '20 at 07:43
1

You can proof the non-expansiveness of the proximal operator as following: $\DeclareMathOperator{\prox}{prox}$

Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].

You can imagine the the proximal operator as some kind of generalised projection operator.

Proof: We pick two arbitrary points $z_1$, $z_2$.

\begin{align} (z_1 - \prox_f(z_1))^\top (x- \prox_f(z_1)) &\le 0 \quad \forall x \in \mathcal C \\ \end{align}

By definition of the proximal operator $\prox_f(z_2)$ is in $\mathcal C$.

The optimality condition for $\prox_f(z_1)$ is: \begin{align} (f_{\prox_f(z_1)} + \prox_f(z_1) - z_1)^\top (\prox_f (z_2) - \prox_f (z_1)) &\ge 0 \label{eq:opt_cond_1}\\ \end{align}

and for $\prox_f(z_2)$: \begin{align} (f_{\prox_f(z_2)} + \prox_f(z_2) - z_2)^\top (\prox_f (z_1) - \prox_f (z_2)) &\ge 0 \\ (z_2 - f_{\prox_f(z_2)} - \prox_f(z_2) - )^\top (\prox_f (z_2) - \prox_f (z_1)) &\ge 0 \label{eq:opt_cond_2} \end{align}

The subgradient properties for $f_{\prox_f(z_1)}$ and $f_{\prox_f (z_2)}$ are:

\begin{align} f(\prox_f(z_2)) - f(\prox_f(z_1)) \ge f_{\prox_f(z_2)}^\top (\prox_f(z_2) - \prox_f(z_1)) \label{eq:subgrad_1}\\ f(\prox_f(z_1)) - f(\prox_f(z_2)) \ge f_{\prox_f(z_1)}^\top (\prox_f(z_1) - \prox_f(z_2)) \label{eq:subgrad_2}\\ \end{align} Adding the two subgradient properties results to: \begin{align} 0 \ge (f_{\prox_f(z_2)} - f_{\prox_f(z_1)})^\top (\prox_f(z_2) - \prox_f(z_1)) \label{eq:subgrad_zero} \end{align}

Adding the two optimality conditions leads us to: \begin{align} ((z_2 - z_1) + (f_{\prox_f(z_1)} - f_{\prox_f(z_2)}) + (\prox_f(z_1) - \prox_f(z_2)))^\top (\prox_f(z_2) - \prox_f(z_1)) &\ge 0 \\ (- (z_2 - z_1) + (f_{\prox_f(z_2)} - f_{\prox_f(z_1)}) + (\prox_f(z_2) - \prox_f(z_1)))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le 0 \end{align}

We use that the subgradient properties to bound the subgradients by zero: \begin{align} (\prox_f(z_2) - \prox_f(z_1))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le ((z_2 - z_1) - (f_{\prox_f(z_2)} - f_{\prox_f(z_1)}))^\top (\prox_f(z_2) - \prox_f(z_1)) \\ (\prox_f(z_2) - \prox_f(z_1))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le (z_2 - z_1)^\top (\prox_f(z_2) - \prox_f(z_1)) \end{align} And use Cauchy-Schwarz: \begin{align} ||\prox_f(z_2) - \prox_f(z_1)||^2 &\le ||z_2 - z_1|| ||\prox_f(z_2) - \prox_f(z_1)|| \\ ||\prox_f(z_2) - \prox_f(z_1)|| &\le ||z_2 - z_1|| \\ &&\square \end{align}

0

I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.

chuan
  • 1