2

Let $f:\mathbb{R}^2\mapsto\mathbb{R}$ be given by $f(w_1,w_2)=\frac{1}{2}(1-w_2\sigma(w_1))^2$, where $\sigma(x)=\max\{x,0\}$ is the ReLU function. I want to compute the Clarke subdifferential of $f$ when $w_1=0$. Is chain rule valid here? I know there is a chain rule in Clarke's book (Optimization and nonsmooth analysis, Theorem 2.3.10) saying that if $g$ is strictly differentiable and $f$ is locally Lipschitz, then the Clarke subdifferential of $F(x)=(f\circ g)(x)$ satisfies chain rule, i.e., $\partial F\subset\partial f(g(x))g'(x)$. However, in our case, the outer function $f$ is smooth, while the inner function is nonsmooth, but ReLU is piecewise linear and convex, is it possible that chain rule still holds?

William
  • 1,055

0 Answers0