2

Let $x_1,x_2,\ldots,x_{\mu}\in\mathbb{R}^n$ and let $\phi:\mathbb{R}^n\to\mathbb{R}$ be defined by $x\mapsto \sum_{j=1}^{\mu}g(x-x_j)^2$, where $g$ denotes the euclidean norm on $\mathbb{R}^n$.

Problem: Find all local extrema of $\phi$ on $\mathbb{R}^n$, if there are any.


My problem is that $\phi$ doesn't seem to be differentiable (at least I haven't been able to find a derivative) and so we have to manually find local extrema or show that there aren't any. In order to show the latter, I've been trying different approaches for a while now but the problem seems to be that if we let $x$ be "further away" from one of the $x_j$'s, we can't really take into account the effect this has on the "distance" from $x$ to the others.

I would very much appreciate help with this.

  • 3
    If by "standard norm", you mean the euclidean norm, i.e. $g(u)= (\sum_{I=1}^n u_i^2)^{1/2}$ for $u=(u_1,\ldots,u_n)\in\Bbb R^n$, then $u\mapsto g(u)^2$ is differentiable. See https://math.stackexchange.com/questions/883016/gradient-of-l2-norm-squared/883024#883024 – Surb May 20 '20 at 19:38

1 Answers1

1

$\phi$ is differentiable. In fact it's $\mathcal C^\infty$. The reason: $\phi$ is a multivariate polynomial of degree 2. Indeed, each term in the sum is of the form $$g(x-x_j)^2=\|x-x_j\|^2=\sum_{i=1}^n (x^{(i)}-x_j^{(i)})^2$$ where the upper index $(i)$ denotes the $i$-th component of a vector.

With that, we can express the Taylor expansion of $\phi$ at an arbitrary point $x$ as $$\phi(x+h)=\phi(x)+\langle \nabla_x\phi, h\rangle + \langle \mathcal H_x h, h\rangle + o(\|h\|^2)\tag{1}$$ where $\nabla_x\phi$ represents the gradient (vector) of $\phi$ at $x$, and $\mathcal H_x$ represents the Hessian (matrix) of $\phi$ at that point.

Now, remember that $x$ is an extremum of $\phi$ if and only if $\nabla_x\phi$ is the zero vector (singular point), and that $\mathcal H_x$ is positive semi-definite for a minimum (or negative semi-definite for a maximum).

So we need to compute the Taylor expansion at $[1]$ to find the gradient and Hessian. With this function, it's fairly simple because: $$\begin{split} \phi(x+h)&=\sum_{j=1}^\mu \|x+h-x_j\|^2\\ &=\sum_{j=1}^\mu\langle x+h-x_j,x+h-x_j\rangle\\ &= \sum_{j=1}^\mu\left(\langle x-x_j,x-x_j\rangle + 2\langle x-x_j,h\rangle +\langle h,h\rangle\right)\\ &= \phi(x) + \left\langle 2\sum_{j=1}^\mu (x-x_j),h\right\rangle + \mu \|h\|^2 \end{split}$$ Thus, identifying with $[1]$, we see that the gradient is $$\nabla_x\phi = 2\sum_{j=1}^\mu (x-x_j)$$ and that the Hessian is $\mu$ times the identity matrix (positive definite).

Equating the gradient to the zero vector leads to $x$ being the centroid (that is, the average) of the points $x_1, ..., x_\mu$: $$x=\frac 1 \mu \sum_{j=1}^\mu x_j$$ So $\phi$ has a unique extremum, it's a minimum, and it's at the centroid of your points.

And there's nothing special about being in finite dimensions (the computation above extends to infinite dimensions).

With this, you have recovered a well-known result. In statistics / machine learning / functional approximation, $\phi$ often represents the estimation / training / approximation error. And the average is the estimate / model / approximation that minimizes that least-square error. In physics, this also has an interpretation as the center of mass of a cloud of points $x_1, ..., x_\mu$ of equal mass. Note that you can extend the result to points with different masses/weights, to obtain that the minimizer is the weighted average. But I digress.

Stefan Lafon
  • 13,728