6

Referring to papers Softmax to Sparsemax and Efficient Projections onto the L1-Ball, what is the relationship between a euclidean projection onto the probability simplex and applying the Softmax function? Both resulting vectors $\boldsymbol{w}$ will satisfy the constraint $\sum_{i}w_{i}=1$, but clearly Softmax is not idempotent and therefore not a projection.

I am also interested in projecting to the L1-ball, where $||\boldsymbol{w}||_{1}\leq 1$. Is there an equivalent function (even if not a projection) that can be applied in the same sense as the Softmax in the first part of this question?

Also, in the context of constrained optimisation, is $||\boldsymbol{w}||_{1}\leq 1$ a more relaxed constraint to $\sum_{i}w_{i}=1$?

Royi
  • 10,050
rnoodle
  • 91
  • You can see a full solution and a MATLAB projection onto the $ {L}_{1} $ Unit Ball in here - https://math.stackexchange.com/questions/2327504. – Royi Aug 22 '17 at 12:43
  • Projection onto the Simplex - https://math.stackexchange.com/questions/2402504. – Royi Aug 22 '17 at 15:57

1 Answers1

3

I can see one direct connection between the actual projection onto the unit simplex and the softmax function. The softmax function is the result of the first iteration of applying the Mirror-Descent algorithm with the Bregman Distance $D_\phi$ generated by $\phi(x) = \sum_{i=1}^n x_i \ln(x_i)$ to the problem $\min_x \{ ||y - x||_2^2 : \sum_{i=1}^n x_i = 1, x \geq 0 \}$, which defines the projection onto the unit simplex.

So, basically, the softmax function is a very crude approximation of the actual projection, since it is only the first iteration of an algorithm which will eventually converge to the actual projection.

  • Is it meaningful to ask what the next iteration of such an algorithm? – rnoodle Jul 27 '17 at 17:05
  • Yes. Look in the paper "Mirror descent and nonlinear projected subgradient methods for convex optimization" by Beck and Teboulle, available at https://web.iem.technion.ac.il/images/user-files/becka/papers/3.pdf. On page 3, they describe the 'Entropic Descent Algorithm', which is a special case of Mirror Descent. – Alex Shtoff Jul 28 '17 at 03:15