0

I'm trying to find the derivative of the following function: $$ f(\beta) = \textbf{1}^Th(X\beta) = \sum_i^n \ln(1+e^{\beta^T \textbf{x}_i}) \\h(t) = \ln(1+e^t) $$ Where $\beta$ is a $(p,1)$ vector, $X$ is a $(n,p)$ vector, and $h(X\beta)$ is the element wise use of the function $h(t)$, i.e. it is a $(n,1)$ vector.

I want to find $\nabla_{\beta}f$, and $\nabla^2_{\beta}f$.

This is easy if I ignore the summation, do this per element wise, and then add it in the end. But I was wondering if there's a way to do this strictly in matrix notation?

The results should come out $X^T \sigma(X\beta)$ (where $\sigma$ is the element-wise sigmoid function) and $X^T \Sigma X$, where $\Sigma$ is a diagonal matrix with $\sigma(X\beta)(1-\sigma(X\beta))$ as it's diagonal.

I think this is related to this question, only instead of differentiating w.r.t. $A$, it's w.r.t. $x$.

user550103
  • 2,773
  • Related https://math.stackexchange.com/questions/3098910/gradient-and-hessian-of-sum-i-log-left1-exp-left-t-i-leftwt-x-i-ri – user550103 Sep 04 '20 at 15:01

1 Answers1

2

Given a scalar $\chi$ variable, a function $\phi=\phi(\chi)$, and its derivative $\phi' = \frac{d\phi}{d\chi},\;$ the differential is easy to calculate $$d\phi=\phi'\,d\chi$$ But when these functions are applied element-wise to a vector $x$ (or matrix $X$) argument, the differential expression requires a Hadamard product $$\eqalign{ &f = \phi(x) \qquad &f' = \phi'(x) \qquad &df = f'\odot dx \\ &F = \phi(X) \qquad &F' = \phi'(X) \qquad &dF = F'\odot dX \\ }$$ In this problem, the derivatives of the function involve the logistic function $\sigma(\chi)$: $$\eqalign{ \phi &= \log(1+e^\chi) ,\qquad \phi'&=\sigma ,\qquad \phi''&=\sigma' = (1-\sigma)\,\sigma }$$ Define the vectors $$\eqalign{ w &= X\beta &\quad\implies dw = X\,d\beta \\ h &= \phi(w) \\ h' &= \sigma(w) &\quad\implies dh = h'\odot dw \\ }$$ Write the objective function then calculate its differential and gradient. $$\eqalign{ \psi &= {\tt1}^Th \\ d\psi &= {\tt1}^Tdh \\ &= {\tt1}^T(h'\odot dw) \\ &= dw^Th' \\ &= (d\beta^TX^T)h' \\ &= (X^Th')^T d\beta \\ \frac{\partial\psi}{\partial\beta} &= X^Th' \;\doteq\; X^T\sigma(X\beta) \\ }$$ For the next part, it will be convenient to rename $\,h'=s,\,$ i.e. $$\eqalign{ s &= \sigma(w)\qquad&s'=({\tt1}-s)\odot s\qquad &ds = s'\odot dw \\ S &= {\rm Diag}(s) &s' = (I-S)s &ds = (S-S^2)\,dw \\ }$$ NB: Replacing Hadamard products with diagonal matrices is a useful trick.

Now calculate the differential and gradient of the gradient. $$\eqalign{ g &= X^Ts \\ &= X^Tds \\ &= X^T(S-S^2)\,dw \\ &= X^T(S-S^2)(X\,d\beta) \\ \frac{\partial g}{\partial \beta} &= X^T(S-S^2)X \\ }$$ The gradient of the gradient is the Hessian. Note that it's symmetric, as one would expect.

greg
  • 40,033