4

Let $D \subset \mathbb{R}^{n}$ be a non empty convex closed set and:

$$f:\mathbb{R}^{n}\rightarrow \mathbb{R}_{+}, f(x)=(\operatorname{dist}(x,D))^{2}$$

Prove that f is differentiable in $\mathbb{R}^{n}$ and

$$f'(x)=2(x-P_{D}(x)), \forall x \in \mathbb{R}^{n},$$ where $\mbox{dist}(x,D)$ is the distance between a point $x$ and the set $D$ and $P_{D}(x)$ is the projection of $x$ in $D$, i.e.,

\begin{align} \operatorname{dist}\left(x, D \right) := \left\| x - P_{D}(x)\right\|_2. \end{align}

user550103
  • 2,773
dgs
  • 73
  • 1
    What techniques have you tried so far? – Zim Jun 15 '20 at 17:02
  • @TSF $P_D$ is not differentiable at the boundary of $D$ – LinAlg Jun 15 '20 at 20:49
  • @LinAlg is the expression given for gradient true? Suppose, if we prove $f$ is differentiable. Can you give a hint how to find gradient of $P_D(x)$? – Shiv Tavker Jun 15 '20 at 21:06
  • @ShivTavker yes, this formula is correct (and the generalization in (*) below is also true). Hint: If $f$ has a gradient, then the directional derivative of $f$ at $x_0$, denoted $f(x,x_0)$, is linear as a function of the direction. The gradient is the slope of this linear functional and is represented (a la Hahn Banach Theorem) via $f(x,x_0)=\langle x , | \nabla f(x_0)\rangle$ – Zim Jun 15 '20 at 21:19
  • The project is not differentiable. Take $D$ to be the closed unit square in the plane and note that $P_D((x,2) = (\max(1,|x|),1)$. – copper.hat Jun 16 '20 at 01:28

3 Answers3

3

Here is a tedious but elementary proof.

Note that the projection is Lipschitz with rank one, that is $\|P_D(x)-P_D(y)\| \le \|x-y\|$ (see here for example).

Note that $f(y) \le \|y-P_D(x)\|^2 = \|x-P_D(x)+y-x\|^2 = f(x) + 2(x-P_D(x))^T(y-x) +\|y-x\|^2$ so we have $f(y)-f(x) - 2(x-P_D(x))^T(y-x) \le \|y-x\|^2$.

Swapping $x,y$ we get $-(f(y)-f(x) - 2(y-P_D(y))^T(y-x)) \le \|y-x\|^2$.

Note that $y-P_D(y) = x-P_D(x) + y-x+P_D(x)-P_D(y)$, so the above becomes \begin{eqnarray} -(f(y)-f(x) - 2(x-P_D(x))^T(y-x)) &\le & \|y-x\|^2-2(y-x+P_D(x)-P_D(y))(y-x) \\ &\le& 4 \|y-x\|^2 \end{eqnarray} In particular, $f$ is differentiable at $x$ and $D f(x)h = 2(x-P_D(x))^T h$.

copper.hat
  • 178,207
2

There is a fast way of proving this as a corollary of the result $$\nabla(M_{\gamma f})=\gamma^{-1}(\textrm{Id}-\textrm{prox}_{\gamma f}),\tag{*}$$ where $\gamma\in\mathbb{R}_{++}$ and $M_{\gamma f}$ is the Moreau Envelope of a proper, lower-semicontinuous, convex function $f:\mathbb{R}^n\to]-\infty,+\infty]$. This result appears in Corollary 12.31 of Bauschke & Combettes' book, vol. 2. The argument essentially states that if you let $\gamma=1/2$ and let $f$ be the $0$-$\infty$ indicator function of the set $D$, then $M_{\gamma f}=\textrm{dist}^2_D/2$ and $\textrm{prox}_{\gamma f}=P_D$. Then you just multiply (*) to get the factor of $2$.

I'd be interested to see a more direct proof using less "heavy-duty" machinery.

Zim
  • 4,623
2

Here is a proof using non differentiable calculus.

Let $d_D(x) = \min_{d \in D} \|x-d\|^2$. The $\min$ is attained at a unique point $P_D(x)$ because $D$ is closed & convex.

If we pick some $x^*$ and restrict $x$ to the closed ball $\overline{B}(x^*,1)$, we can assume that $D$ is compact. To see this, pick $R=\sqrt{d_D(x^*)}+1$ and let $D' = D \cap \overline{B}(x^*,R)$. Then $d_D(x) \le \|x-P_D(x^*)\|^2 \le (\|x-x^*\| + \sqrt{d_D(x^*)})^2 \le R^2$. In particular, $P_D(x) \in D'$ and so, locally, $d_D(x) = d_{D'}(x)$, so we may assume that $D$ is bounded and hence compact.

We can write $d_D(x) = - g(x)$, where $g(x)=\max_{d \in D} \phi(x,d)$ and $\phi(x,d) = - \|x-d\|^2 $. Since $g$ is locally Lipschitz it has a (Clarke) generalised gradient and we can compute it by $\partial g(x) = \operatorname{co} \{ { \partial \phi(x,d) \over \partial x} \}_{d \in I(x)}$ with $I(x) = \{ d \in D | \phi(d,x) = g(x) \}$. Since the maximiser is unique, it turns out that $g$ is differentiable and ${\partial g(x) \over \partial x} = { \partial \phi(x,P_D(x)) \over \partial x} = - 2(x-P_D(x))^T$. Hence $d_D$ is differentiable and ${\partial d_D(x) \over \partial x} = 2(x-P_D(x))^T$.

copper.hat
  • 178,207