I am struggling to understand the dual averaging algorithm as presented in the paper Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling. More precisely the update of the parameters given as $$\Pi^\psi_\chi (z,\alpha) := \operatorname{argmin}_{x \in \chi } \{ \langle z, x \rangle + \frac{1}{\alpha} \psi(x)\}$$ is beyond my mathematical abilities.
What I do understand is that $z$ is the sum of all previously observed gradients and $\psi(x)$ is a regularization term weighted by the stepsize $\alpha$.
The paper just states that this is "a type of projection" and the remarks in the apendix about the algorithm did not help me either. Also the paper Yurii Nesterov - Primal Dual Subgradient Methods for Convex Problems did not clear it up for me.
I consulted several other papers and resources on the web but there seems to be one crucial point I am missing about the calculation of the $\operatorname{argmin}$ as it is used as a proximal operator here. I also didn't really get the concept of proximal operators.
I already found the answers in the question Least squares Problem with Non Negativity Constraints but also did not understand it completely and especially the part "functions for which the argmin problem in the definition above are easy to solve, for any point x" bugs me as I personally do not see that ease of computation.
Obviously I am not a mathematician, so please forgive me my perhaps informal description of the problem.
As I intend to implement the algorithm a pointer to an existing implementation would help me a lot (as I tend to understand code way better than formulas), but any help is highly appreciated!