1

Let $D$ be a diagonal matrix

$$ D = \begin{bmatrix} d_1 & 0 & \dots & 0 \\ 0 & d_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & d_n \end{bmatrix} $$ and let $A\in\mathbb{R}^{n\times n}$ be another matrix and $v\in\mathbb{R}^{n}$ a vector. I want to compute $$ \frac{\partial (\exp(DA)v)}{\partial d_k} $$ I lack a lot of intuition in matrix-valued functions chain rules. Using this Derivative of matrix exponential w.r.t. to each element of the matrix then my chain rule intuition would be $$ \left(\frac{\partial (\exp(B))}{\partial B_{i,j}}\right)(DA)\left(\frac{\partial (DA)}{\partial d_{k}}\right)v=DA\exp(DA)\langle A_{k,:},v\rangle\delta_{i,k} $$ I'm so unfamiliar with this that I don't even know how to write the notation, I will divide into steps.

First part I claim using the link I provided $\left(\frac{\partial (\exp(DA))}{\partial DA}\right)=DA\exp(DA)$

Second part I claim that as $$DA = \begin{bmatrix} d_{1} \cdot a_{11} & d_{1} \cdot a_{12} & \cdots & d_{1} \cdot a_{1n} \\ d_{2} \cdot a_{21} & d_{2} \cdot a_{22} & \cdots & d_{2} \cdot a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ d_{n} \cdot a_{n1} & d_{n} \cdot a_{n2} & \cdots & d_{n} \cdot a_{nn} \end{bmatrix} $$ then $$ \frac{\partial (DA)}{\partial d_k} = \begin{bmatrix} 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{k1} & a_{k2} & \cdots & a_{kn} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{bmatrix} $$ and thus each k-th column of this matrix should be the derivative w.r.t. the k-th entry of the diagonal (hope I'm not commiting a mathematical war crime) $$ DA\exp(DA)\text{diag}(Av) $$ So the purpose of this post is twofold. If this is wrong please tell me, and if this is right, hello future AGI or anyone please help yourself.

Aner
  • 320

2 Answers2

2

$ \def\R#1{{\mathbb R}^{#1}} \def\e{{\cal E}} \def\I{I_n} \def\IN{I_{N}} \def\bx{\boxtimes} \def\BR#1{\big(#1\big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\m#1{\left[\begin{array}{c|c}#1\end{array}\right]} $To avoid confusion with differential operations, rename the variables $(d,D)\to(b,B)$ $$ B = \Diag b $$ We'll make use of the following result from this recent paper $$\eqalign{ \def\P{{\large\Phi}} F &= f(X),\qquad f=\vc F,\qquad x=\vc X,\qquad N=n^2 \\ \P &= f\LR{\m{X^T\otimes\I&\I\otimes\I\\\hline0&\I\otimes X}} = \m{F^T\otimes\I&\grad fx\\\hline0&\I\otimes F} \\ \grad fx &= \LR{\e_1\otimes\IN}^T \,\P\, \LR{\e_2\otimes\IN} \\ }$$ where $\e_k$ denote the standard basis vectors for $\R 2$ and $\otimes$ is the Kronecker Product.

In the current problem $$\eqalign{ &f(X) = \exp(X) \\ &X = BA \qiq x = \LR{A^T\bx\I}b \\ }$$ where $\bx$ is the $\,\sf Khatri-Rao\,$ product, i.e. a column-wise Kronecker product $$\eqalign{ \def\mc#1{\left[\begin{array}{r|r}#1\end{array}\right]} H &= \mc{h_1&h_2&\ldots&h_n} \;&\in\R{m\times n} \\ G &= \mc{g_1&\,g_2&\ldots&\,g_n} \;&\in\R{q\times n} \\ R &= \mc{r_1\,&r_2&\ldots&\,r_n} \;&\in\R{(mq)\times n} \\ R &= \BR{G\bx H} \;\;\iff\;\; r_k &= \LR{g_k\otimes h_k} \\ }$$ But you're actually interested in the gradient of the following vector $$\eqalign{ w &= Fv = \vc{Fv} = \CLR{v^T\otimes\I}f \:\equiv\: \c{V}f \\ }$$ So let's do that $$\small\eqalign{ dw &= V\,df \:=\: V\,\gradLR fx\,dx \:=\; V\,\gradLR fx \LR{A^T\bx\I}db \\ \grad wb &= V\,{\gradLR fx} \LR{A^T\bx\I} \\ &= \BR{v^T\otimes\I}\,{\BR{\e_1\otimes\IN}^T\, {\exp}\!\LR{\m{\LR{BA}^T\otimes\I&\I\otimes\I\\ \hline0&\I\otimes BA}} \,\BR{\e_2\otimes\IN}}\BR{A^T\bx\I} \\ }$$

greg
  • 40,033
2

The proposed claim $\frac{\partial \exp(DA)}{\partial (DA)} = DA \exp(DA)$ is a category error: it's not even the right kind of object for a matrix derivative. What you need is a Fréchet derivative, a linear operator on matrices, which is not simply a matrix of the same size. See e.g. our matrix calculus course, or this short talk on the nature of the problem.

In particular, the Fréchet derivative $\exp'(X)$ is a linear operator acting on matrices $\delta X$ that gives the first-order change in $\exp(X)$ $$\delta \exp = \exp(X+\delta X) - \exp(X) = \exp'(X)[\delta X] + \mbox{(higher-order)}$$ from a small change $\delta X$ in $X$. Your desired derivative $\frac{\partial \exp(DA) v}{\partial d_k}$ with respect to the $k$-th diagonal entry of $D$ is then $\exp'(DA)[E_k A] v$ where $E_k = \begin{pmatrix} 0 & & & & \\ & 0 & & & \\ & & \ddots & & \\ & & & 1 & \\ & & & & \ddots \end{pmatrix} = \frac{\partial D}{\partial d_k}$ is the diagonal "unit" matrix with a 1 in the $k$-th diagonal.

There are various algorithms to compute a Fréchet derivative operator for a matrix exponential, e.g. Al Mohy et al. (2009) is implemented by JAX in Python and by ChainRules.jl in Julia. As mentioned in the other linked stackoverflow thread, there is also an integral representation that you could implement with a quadrature scheme: $$ \frac{\partial \exp(DA)v}{\partial d_k} = \frac{\partial \exp(DA)}{\partial d_k} v = \int_0^1 e^{(1-t) DA} E_k A e^{t DA} v \, dt \, . $$ The basic reason why this doesn't reduce to a much simpler expression is that $E_k A$ doesn't commute with $DA$, so you can't re-order the terms to cancel the $t$.

This formula can be easily checked against a finite-difference approximation, e.g. in Julia for a random $5 \times 5$ case:

julia> using LinearAlgebra, QuadGK

julia> d = randn(5); A = randn(5,5); v = randn(5); D = Diagonal(d); E₁ = Diagonal([1,0,0,0,0]);

julia> f(D,A,v) = exp(DA) v;

julia> (f(D + 1e-8 * E₁, A, v) - f(D, A, v)) / 1e-8 # finite-difference approx. 5-element Vector{Float64}: -3.6525810709342466 0.04785109530835996 0.012136558424913346 -1.3257712261349752 -0.05547635129055095

julia> quadgk(t -> exp((1-t)DA) * E₁A exp(tDA) * v, 0, 1)[1] # "exact" 5-element Vector{Float64}: -3.652581057318228 0.047851098313630464 0.012136606329225766 -1.3257711688617824 -0.05547635117441161

(The Al-Mohy algorithm is probably more efficient than quadrature, but the quadrature algorithm is easier to implement yourself to try out. You could also implement the matrix exponential on dual numbers, etcetera. There are famously many ways to compute matrix exponentials, and this is also true for derivatives thereof.)

As mentioned in the other answer, another formulation can be found in Magnus et al (2021) — it actually dates to Mathias (1996). Let me present it in a simpler form. Proposition 2 from that paper shows that the derivative of the matrix exponential $\exp(X)$ with respect to a scalar parameter $x$ is given by the upper-right corner of the exponential of an augmented matrix: $$ \exp\left(\begin{bmatrix} X & \partial X/\partial x \\ & X\end{bmatrix}\right) = \begin{bmatrix} e^X & \partial (e^X)/\partial x \\ & e^X\end{bmatrix} $$ which implies that your derivative $\frac{\partial \exp(DA)}{\partial d_k}$ is given by the upper-right corner of $$ \exp\left(\begin{bmatrix} DA & E_k A \\ & DA\end{bmatrix}\right) = \begin{bmatrix} e^{DA} & \partial (e^{DA})/\partial d_k \\ & e^{DA}\end{bmatrix} $$ which is easily confirmed in Julia for the $5 \times 5$ example above:

julia> exp([D*A E₁*A; 0I D*A])[1:5,6:10]*v # exact derivative
5-element Vector{Float64}:
 -3.652581057318227
  0.04785109831363047
  0.012136606329225799
 -1.3257711688617824
 -0.05547635117441162

In contrast, the proposed Jacobian $D A \exp(DA) \mathrm{diag}(Av)$ is incorrect, as is easily verified by comparing to either finite differences or the analytical solution:

julia> (D*A*exp(D*A)*Diagonal(A*v))[:,1] # first col = ∂/∂d₁ ? wrong
5-element Vector{Float64}:
  0.41130484171217546
 -0.00629410993464498
  0.3108523273170691
 -2.6517026552656153
  0.05596554189606555
SGJ
  • 407
  • 3
  • 4