3

I have some matrix $\mathbf{X}$ $=$ $\mathbf{A}$ $\odot$ $\mathbf{B}$, where $\mathbf{X}$, $\mathbf{A}$, and $\mathbf{B}$ $\in \mathbb{R}^{M \times N}$, and $\odot$ represents the elementwise multiplication operator -- the Hadamard product.

I know that $\mathbf{A}$ and $\mathbf{B}$ can individually be written in terms of functions of some other matrix $\mathbf{Y}$ $\in \mathbb{R}^{I \times J}$, and I am interested in obtaining the Jacobian of $\mathbf{X}$ w.r.t. $\mathbf{Y}$, i.e., I want to solve for $\frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{Y})}$ $\in \mathbb{R}^{MN \times IJ}$, using the Jacobians w.r.t. $\mathbf{A}$ and $\mathbf{B}$.

According to this stackexchange post discussing the Hadamard product,, user91684's answer implies that: The Hadamard product $\mathbf{A} \odot \mathbf{B}$ is bilinear and the derivative satisfies

\begin{align*} \mathrm{d}(\mathbf{A} \odot \mathbf{B})= \mathrm{d}\mathbf{A} \odot \mathbf{B} + \mathbf{A} \odot \mathrm{d}\mathbf{B} \end{align*}

Using this, I assume that the Jacobian looks something like this:

\begin{align*} \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{Y})} = \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{A})} \frac{\partial \text{vec}(\mathbf{A})}{\partial \text{vec}(\mathbf{Y})} + \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{B})} \frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{Y})} \end{align*}

and I believe that $\frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{A})}$ $\in \mathbb{R}^{MN \times MN}$ and $\frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{B})}$ $\in \mathbb{R}^{MN \times MN}$ are given as:

\begin{align*} \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{A})} = \text{diag} \bigg( \text{vec} \big( \mathbf{B}^\top \big) \bigg) \hspace{2.0cm} \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{B})} = \text{diag} \bigg( \text{vec} \big( \mathbf{A} \big) \bigg) \end{align*}

where $\text{diag} \big( \text{vec} ( \mathbf{A} ) \big)$ $\in \mathbb{R}^{MN \times MN}$ represents taking the vectorization of $\mathbf{A}$ $\in \mathbb{R}^{M \times N}$ and forming a diagonal matrix from its elements.

I don't know how to verify this. I am basing this off of similar derivations of Jacobians where we have normal matrix multiplication instead of elementwise matrix multiplication, but I don't know if similar rules apply for the Jacobian over elementwise multiplication.

I feel like this could be wrong. Noting that the Hadamard product is commutative, such that $\mathbf{A} \odot \mathbf{B}$ $=$ $\mathbf{B} \odot \mathbf{A}$, I feel like we can equivalently have

\begin{align*} \mathrm{d}(\mathbf{A} \odot \mathbf{B}) = \mathbf{B} \odot \mathrm{d}\mathbf{A} + \mathbf{A} \odot \mathrm{d}\mathbf{B} \end{align*}

where $\frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{A})}$ $\in \mathbb{R}^{MN \times MN}$ and $\frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{B})}$ $\in \mathbb{R}^{MN \times MN}$ could instead be given as:

\begin{align*} \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{A})} = \text{diag} \bigg( \text{vec} \big( \mathbf{B} \big) \bigg) \hspace{2.0cm} \frac{\partial \text{vec}(\mathbf{X})}{\partial \text{vec}(\mathbf{B})} = \text{diag} \bigg( \text{vec} \big( \mathbf{A} \big) \bigg) \end{align*}

I don't know if things inside of here need to be transposed or not. This feels very wrong to me, and I was unable to find any resources about taking the Jacobian w.r.t. elementwise multiplication.

Ben Grossmann
  • 234,171
  • 12
  • 184
  • 355
Tychus
  • 105
  • 1
    The things inside do not need to be transposed, your second formula is correct. – Ben Grossmann Dec 13 '24 at 14:53
  • 1
    A sanity check that disproves your first approach: since $A \odot B = B \odot A$ as you noted, we should expect the formulas for $$ \frac{\partial (A \odot B)}{\partial A}, \quad \frac{\partial (B \odot A)}{\partial A} $$ to match, but the transpose in your first formula breaks this symmetry – Ben Grossmann Dec 13 '24 at 14:54
  • 2
    I think the formula becomes a bit more obvious if you note that $$ \operatorname{vec}(A \odot B) = \text{diag}(\text{vec}(A)) \text{vec}(B) = \text{diag}(\text{vec}(B)) \text{vec}(A). $$ – Ben Grossmann Dec 13 '24 at 14:58

1 Answers1

2

$ \newcommand\dd{\mathrm d} \newcommand\m\mathbf \newcommand\vecc{\mathrm{vec}} \newcommand\diag{\mathrm{diag}} $There is a straightforward way to turn $$ \dd(\m A\odot\m B) = \dd\m A\odot\m B + \m A\odot\dd\m B \tag{$*$} $$ into an expression for the Jacobian.

The differential is a linear transformation for fixed $\m Y$, and the Jacobian is its matrix (or rather tensor). For example, $\dd\m X$ takes in a small change $\delta\m Y$ to $\m Y$ and outputs the change $\dd\m X(\delta\m Y)$ induced in $\m X$. The equation ($*$) says that $$ \dd\m X(\delta\m Y) = \dd\m A(\delta\m Y)\odot\m B + \m A\odot\dd\m B(\delta\m Y). $$ We need to write this as a matrix multiplication; given the way you've laid out your data, $\delta\m Y\in\mathbb R^{I\times J}$ and we use the vectorization operation $\vecc$ to write $$ \vecc[\dd\m X(\delta\m Y)] = \frac{\partial\vecc(\m X)}{\partial\vecc(\m Y)}\vecc(\delta\m Y). $$ We do similar with $\dd\m A$ and $\dd\m B$. Since the Hadamard product is elementwise, we can vectorize before applying it, yielding $$ \frac{\partial\vecc(\m X)}{\partial\vecc(\m Y)}\vecc(\delta\m Y) = \left[\frac{\partial\vecc(\m A)}{\partial\vecc(\m Y)}\vecc(\delta\m Y)\right]\odot\vecc(\m B) + \vecc(\m A)\odot\left[\frac{\partial\vecc(\m B)}{\partial\vecc(\m Y)}\vecc(\delta\m Y)\right]. $$ The Hadamard product is commutative, and using the fact that e.g. $$ \vecc(\m A)\odot v = \diag(\vecc(\m A))v $$ we get $$ \frac{\partial\vecc(\m X)}{\partial\vecc(\m Y)}\vecc(\delta\m Y) = \left[\diag(\m B)\frac{\partial\vecc(\m A)}{\partial\vecc(\m Y)} + \diag(\m A)\frac{\partial\vecc(\m B)}{\partial\vecc(\m Y)}\right]\vecc(\delta\m Y) $$ where I've abbreviated e.g. $\diag(\m A) := \diag(\vecc(\m A))$. It follows that $$ \frac{\partial\vecc(\m X)}{\partial\vecc(\m Y)} = \diag(\m B)\frac{\partial\vecc(\m A)}{\partial\vecc(\m Y)} + \diag(\m A)\frac{\partial\vecc(\m B)}{\partial\vecc(\m Y)}. $$