When reading a proof I came across the following step:
$$\operatorname{Var}(x^Ty) = x^T\operatorname{Var}(y)x$$ $x$ and $y$ are column vectors. How can you derive this?
When reading a proof I came across the following step:
$$\operatorname{Var}(x^Ty) = x^T\operatorname{Var}(y)x$$ $x$ and $y$ are column vectors. How can you derive this?
Let us understand what is meant by the "variance" of a column vector. Suppose $y$ is a random vector taking values in $\mathbb R^{n\times1},$ and let $\mu =\mathbb E[y].$ Then we define $$ \operatorname{cov}(y) = \operatorname{E}((y-\mu)(y-\mu)^T) \in \mathbb R^{n\times n}. $$ Here we assumed that $y$ is random. For what we do next, we must assume $x$ is not random. We have \begin{align} & \operatorname{var}(x^T y) = \operatorname{E}\Big( \big(x^T(y-\mu)\big)\big(x^T(y-\mu)\big)^T \Big) \\[10pt] = {} & \operatorname{E}\Big( x^T(y-\mu) (y-\mu)^T x\Big) \\[10pt] = {} & x^T \operatorname{E}\Big((y-\mu)(y-\mu)^T\Big) x \qquad \text{because } x \text{ is not random,} \\[10pt] = {} & x^T \operatorname{cov}(y) x. \end{align}
The full equation is:
$$Var(\hat\beta_p) = Var(\frac{z_p^Ty}{\langle z_p,z_p\rangle}) = \frac{z_pVar(y)z_p}{\langle z_p,z_p\rangle^2}=\frac{z_p^T(\sigma^2I)z_p}{\langle z_p,z_p\rangle^2}=\frac{\sigma^2}{\langle z_p,z_p\rangle^2}$$
The context is fitting a linear model with least-squares loss using orthogonalization. $z_p$ is the orthogonal residual from regressing $x_p$ (the p-th column of the design matrix) on the previous residuals, and $\hat\beta_p$ is the projection of $y$ onto $z_p$. The data samples are random so I think $z_p$ is random too.
– Jake Grimes Jan 09 '17 at 05:19