7

When reading a proof I came across the following step:

$$\operatorname{Var}(x^Ty) = x^T\operatorname{Var}(y)x$$ $x$ and $y$ are column vectors. How can you derive this?

  • $x$ is not random? – Cave Johnson Jan 09 '17 at 04:59
  • Actually I think $x$ is random too.

    The full equation is:

    $$Var(\hat\beta_p) = Var(\frac{z_p^Ty}{\langle z_p,z_p\rangle}) = \frac{z_pVar(y)z_p}{\langle z_p,z_p\rangle^2}=\frac{z_p^T(\sigma^2I)z_p}{\langle z_p,z_p\rangle^2}=\frac{\sigma^2}{\langle z_p,z_p\rangle^2}$$

    The context is fitting a linear model with least-squares loss using orthogonalization. $z_p$ is the orthogonal residual from regressing $x_p$ (the p-th column of the design matrix) on the previous residuals, and $\hat\beta_p$ is the projection of $y$ onto $z_p$. The data samples are random so I think $z_p$ is random too.

    – Jake Grimes Jan 09 '17 at 05:19
  • It is somehow confusing. $Var(x^T y)$ is not random, while $x^T Var(y)x$ is random. Am I missing some point? – Cave Johnson Jan 09 '17 at 05:29
  • Typos in the equation fixed: $$Var(\hat\beta_p) = Var(\frac{z_p^Ty}{\langle z_p,z_p\rangle}) = \frac{z_p^TVar(y)z_p}{\langle z_p,z_p\rangle^2}=\frac{z_p^T(\sigma^2I)z_p}{\langle z_p,z_p\rangle^2}=\frac{\sigma^2}{\langle z_p,z_p\rangle}$$ – Jake Grimes Jan 09 '17 at 05:34
  • I don't get what you are asking. $y$ is a random column vector and I think $x$ is too. But the variance of $x^Ty$ is a constant. – Jake Grimes Jan 09 '17 at 05:36
  • 1
    The equation you put in your question,i.e., $\mathrm{Var}(x^T y)=x^T \mathrm{Var}(y)x$, is a little different from the one you mentioned in your comments, and the former seems wrong. You already noticed $\mathrm{Var}(x^T y)$ is a constant while $x^T \mathrm{Var}(y)x$ is a random variable, then how is those two supposed to be equal? – Cave Johnson Jan 09 '17 at 05:39
  • Oh I see how I was wrong. $x$ must be constant. – Jake Grimes Jan 09 '17 at 05:43
  • According to Wikipedia this is just a basic property of Variance matrices: https://en.wikipedia.org/wiki/Covariance_matrix (Property 3) I still don't see how it was derived though. – Jake Grimes Jan 09 '17 at 05:43
  • It looks like you can do it with Property 1 from the above page. – Jake Grimes Jan 09 '17 at 05:45
  • https://math.stackexchange.com/q/2365166/321264 – StubbornAtom Jul 22 '20 at 18:39

1 Answers1

15

Let us understand what is meant by the "variance" of a column vector. Suppose $y$ is a random vector taking values in $\mathbb R^{n\times1},$ and let $\mu =\mathbb E[y].$ Then we define $$ \operatorname{cov}(y) = \operatorname{E}((y-\mu)(y-\mu)^T) \in \mathbb R^{n\times n}. $$ Here we assumed that $y$ is random. For what we do next, we must assume $x$ is not random. We have \begin{align} & \operatorname{var}(x^T y) = \operatorname{E}\Big( \big(x^T(y-\mu)\big)\big(x^T(y-\mu)\big)^T \Big) \\[10pt] = {} & \operatorname{E}\Big( x^T(y-\mu) (y-\mu)^T x\Big) \\[10pt] = {} & x^T \operatorname{E}\Big((y-\mu)(y-\mu)^T\Big) x \qquad \text{because } x \text{ is not random,} \\[10pt] = {} & x^T \operatorname{cov}(y) x. \end{align}

dohmatob
  • 9,753