I was recently reading a proof of Kullback Leibler divergence between two multivariate normal distributions, and there are some steps that raised my concerns that I want to lighten.
Author of this proof denotes that $P \sim \mathcal N(\mu_p, \Sigma_p)$ and $Q \sim \mathcal N(\mu_q, \Sigma_q)$ both $k$ dimensional. Basing on this I have two questions (which are more related to linear algebra rather than probability):
$(1)$ Author states, that $(x - \mu_p)^T\Sigma_p^{-1}(x - \mu_p) \in \mathbb R$, but is this true actually? In my opinion it looks like this:
$(x - \mu_p)^T$ has dimension $\mathbb R^{n} \times \mathbb R^{k}$ (number of observations vs dimension)
$\Sigma_p^{-1}$ has dimension $\mathbb R^k \times \mathbb R^k$ (dimension vs dimension)
$(x - \mu_p)$ has dimension $\mathbb R^{k} \times \mathbb R^{n}$ (dimension vs obersvation).
Finally we end up with matrix $n \times n$ which is not exactly $1 \times 1$. What am I missing?
$(2)$ Author says that because $\text tr(ABC) = \text tr(CAB)$ we have that:
$$\text tr((x - \mu_p)^T\Sigma_p^{-1}(x - \mu_p) = \text tr((x - \mu_p)(x - \mu_p)^T\Sigma_p^{-1})$$
However, in my opinion is not that easy. Usually $\text tr(ABC) \neq \text tr(CAB)$, but it is really true, when each of them is symmetric. Indeed, if $A, B, C$ are symmetric then any permutation within the trace is always valid. However, in our example $(x - \mu_p)^T$ of course is not symmetric, because is not a squared matrix. What am I missing in this case?
Could you please explain to me those two facts?