How to calculate set of subgradients

Question

I am trying to understand question 1(a) here. They calculate a subgradient at a given $x$ for the following convex function:

$$f(x) =\max_{i=1, ..., m} (a_i^Tx + b_i)$$

The solution is to find $k \in \{1,...m\}$ for which $f(x) = (a_k^T x + b_k)$, then the subgradient is $a_k$.

I understand the solution. However, it is only one solution. The size of the set of subgradients is infinite. Therefore, I wonder, what are all the other subgradients? How can I calculate other subgradients from the set?

Thanks.

The set of subgradients at a point is not always infinite. If you understand the subgradient of the absolute value function, you should understand the subgradient of this one too. — LinAlg, Jan 08 '17 at 19:15
Just as @LinAlg points out, your subgradient though convex, will not be always finite. In fact , it's the convex hull of all such $a_k$. See my answer below. — dohmatob, Jan 09 '17 at 08:13
@LinAlg as far as I know: the absolute value function is not differentiable at the point zero. Thus, for the points where x > 0 and x<0, the differential set is simply the derivative (1 and -1), but for the point x=0, the subdifferential is the set of all numbers between -1 and 1 (i.e., all tangent lines that go through x=0 and are beneath the function). I can understand that because I can visualize it easily. In my question, there is a function that I can no longer visualize, which is why I am slightly confused. — Cheshie, Jan 09 '17 at 10:39
You can still visualize it if $x \in \mathbb{R}$. If the maximum is attained by only one function, the subgradient is $a_i$. Otherwise, it is the convex hull of $a_i$ of all functions that attain the maximum, as dohmatob notes. For the absolute value function, the subgradient at $0$ is the convex hull of $-1$ and $1$, or $[-1,1]$. — LinAlg, Jan 09 '17 at 14:34

score 5 · Answer 1 · edited Apr 13 '17 at 12:21

By a theorem of Danskin and Bertsekas (I call it "the Bertsekas-Danskin Theorem for subgradients", see link below), the subgradient of $f$ is the convex hull of all such $a_k$, and corresponds to a face of the polyhedron $\mathcal P_A := \text{conv}\{a_k | 1 \le k \le m\}$. I've proven a more general result here.

Precisely, you deduce that $$ \begin{split} \partial f(x) &= \partial \max_{1 \le i \le m}a_i^Tx_i + b_i = \partial \max_{y \in \Delta_m}y^T(A^Tx + b)\\ &= \text{conv}\{\nabla_x (\hat{y}^T(A^Tx + b))| \hat{y} \in \Delta_m, \hat{y}^T(A^Tx + b) =f(x)\}\\ &= \text{conv}\{A\hat{y}| \hat{y} \in \Delta_m, \hat{y}^T(A^Tx + b) = f(x)\} \\ &= \text{conv}\{a_k | 1 \le k \le m, a_k^Tx + b_k = f(x) \}, \end{split} $$ as claimed.

How to calculate set of subgradients

1 Answers1