I have a set of vectors that I am trying to predict from another set of vectors using a matrix $W$. To find this matrix, I decide I want to minimize the $\ell^2$ norm of the error, e.g.:
$$ \text{find} \min_W \|y - Wx\|_2 \\ x,y \in \mathbb{C}^N \quad W\in \mathbb{C}^{N \times N} $$
Where $x$ and $y$ are respective vectors from their sets, and each pair is (I hypothesize) related to eachother by $W$. I start by expanding this out:
$$ \min_W \left[ (y - Wx)^H (y - Wx) \right] \\ = \min_W \left[ (y - Wx)^H y - (y - Wx)^H Wx \right] \\ = \min_W \left[ (y^H y - y^H Wx)^H - (x^H W^H y - x^H W^H Wx)^H \right] \\ = \min_W \left[ y^H y - x^H W^H y - y^H W^H x + x^H W^H W x \right] $$
Where $(\cdot)^H$ denotes Hermitian transpose. I want to take the derivative with respect to $W$ and set it equal to zero, but I'm having a hard time with the derivative as I'm not sure exactly how that works with Hermitian transposes. Taking a look at this page, it looks like I might be in trouble (e.g. I'm going about this the wrong way), but I wanted to pick your brains to see if you all had any ideas on how to move forward from here.
Thank you all!
EDIT Michael C. Grant has pointed out that this question is underdetermined, and he is right, if I only have a single $x$ and $y$. However I have a set of $x$ and $y$ vectors that I assume are related to eachother by $W$ (And since right now I am working with simulations, I can generate as many pairs of $x$'s and $y$'s as I want). This is a kind of "system identification" problem, where I have input-output correspondences and I'm trying to understand how the math is derived.