0

Define the norm-constrained least squares estimator $$\hat{x}_{(c)} = \arg\min_{\|{x}\| \leq c} \|y - A {x}\|_2^2,$$ for $A \in \mathbb{R}^{n \times p}$, the vectors $y$ and ${x}$ conformable, $\|\cdot\|_2$ is the Euclidean norm, and $\|\cdot\|$ is an arbitrary norm. For convenience, we will occasionally suppress the subscript $(c)$. By Lagrangian duality, we know that there exists some $\lambda$ so that we may equivalently write $$\hat{x} = \arg\min_{{x} \in \mathbb{R}^p} \frac{1}{2} \|y - A {x}\|_2^2 + \lambda \|{x}\|. \tag{2}$$

When $\lambda > 0$, we know by complemenetary slackness that the constraint is binding, so we can rewrite $$\hat{x}_{(c)} = \arg\min_{\|{x}\| = c} \|y - A {x}\|_2^2.$$ Henceforth consider only $c$ so that $\lambda > 0$ and the above expression holds: let $C$ be so that such $c\in [0, C]$.

I'm interested in exploring possible representations of $\lambda$. I've derived some representations under conditions I'd like to relax. I'm particularly interested in relaxing differentiability to subdifferentiability.

From the first order conditions applied to $(2)$, we know that $0 = A^T \left(y - A \hat{x}\right) + \lambda \hat{z}$, where $\hat{z}$ is in the subgradient of $\|\cdot\|$ evaluated at $\hat{x}$. Assuming that $\hat{z}_j \ne 0$ for all $j$, we can take the inner product of the first order identity above with $\hat{z}^{-1}$ (applied componentwise) to find that $$\lambda = \frac{-1}{p} \left[ A^T \left( y - A \hat{x} \right)\right]^T \hat{z}^{-1} = \frac{-1}{p} \sum_{j=1}^p \frac{A_j^T (y - A \hat{x})}{\hat{z}_j}.$$ If we now assume further that $\hat{z}$ is the gradient (so that the norm is not differentiable at $\hat{x}$), then, by the chain rule, we can re-express $$\lambda = \frac{-1}{p} \sum_{j=1}^p \frac{A_j^T (y - A \hat{x})}{\hat{z}_j} = \frac{-1}{p} \sum_{j=1}^p \frac{\partial \frac{1}{2} \|y - A {x}\|_2^2 }{\partial {x}_j} \frac{\partial {x}_j}{\partial \|{x}\|} \bigg|_{\hat{x}} = \frac{-1}{p} \frac{\partial \frac{1}{2} \|y - A {x}\|_2^2}{\partial \|{x}\|} \bigg|_{\hat{x}}.$$

Does the identity $\lambda = \frac{-1}{p} \frac{\partial \frac{1}{2} \|y - A {x}\|_2^2}{\partial \|{x}\|} \bigg|_{\hat{x}}$ hold when $\|\cdot\|$ is only subdifferentiable at $\hat{{x}}$ or $\hat{z}_j = 0$ for some $j$?

The previous derivative $\frac{\partial \frac{1}{2} \|y - A {x}\|_2^2}{\partial \|{x}\|}$ brings to mind a plane with $\|{x}\|$ on the horizontal axis and $\frac{1}{2} \|y - A {x}\|_2^2$ on the vertical axis. The function $f: (0, C) \to \mathbb{R}$ defined by $$f(c) = \min_{\|{x}\|=c} \|y - A {x}\|_2^2$$ traces out a curve on this plane. We know that \begin{align*} \frac{\partial}{\partial c} f(c) = \frac{\partial}{\partial c} \|y - A \hat{x}\|_2^2 = \left[ 2 A^T (y - A \hat{x}) \right] \cdot \frac{\partial \hat{x}}{\partial c} = (-2 \lambda) \left( \hat{z} \cdot \frac{\partial \hat{x}}{\partial c} \right), \end{align*} whenever these exist, where we've again substituted the first order condition. By the chain rule, we know that (assuming $\hat{z}_j \ne 0$ for all $j$ and the $\hat{z}$ is the gradient) $\frac{\partial \hat{x}}{\partial c} = \frac{\partial \hat{x}}{\partial \|\hat{x}\|} \frac{\partial \|\hat{x}\|}{\partial c} = \hat{z}^{-1}$, so that $\frac{\partial}{\partial c} f(c) = -2p \, \lambda,$ and we can conclude that $$\frac{\partial \frac{1}{2} \|y - A {x}\|_2^2}{\partial \|{x}\|}\bigg|_{\hat{x}_{(c)}} = f'(c).$$

Does the identity $\frac{\partial \frac{1}{2} \|y - A {x}\|_2^2}{\partial \|{x}\|}\bigg|_{\hat{x}_{(c)}} = f'(c)$ hold when ${x} \mapsto \|{x}\|$ is only subdifferentiable at $\hat{x}$ or $\hat{z}_j = 0$ for some $j$?

0 Answers0