I am doing a homework sheet as practice for an upcoming course in multivariate statistics and been stuck on the following problem:
Let $(Y,X_1,X_2,\ldots,X_p)^T\stackrel{d}{=}\mathcal{N}_{p+1}(0,\mathbf{C})$. Prove that $\mathbb{E}(Y|X_1,X_2,\ldots,X_p)$ maximizes Corr($Y,f(X_1,X_2,\ldots,X_p)$) in the space of $f$ linear functions.
My attempt so far:
Proof of the conditional expectation being linear and calculating its value:
We omit the linearly dependent cases due to their triviality. First as a starting point, we prove that the regression is minimized by a linear combination of the $X_i$s, and calculate the coefficients of this. Partition $\mathbf{C}$ and use the well-known identity for block matrices, using $Var Y:=\sigma^2$, $Var X_i=\sigma_i^2=c_{ii}$, $Cov(X_i,Y)=c_i$ and $Cov(X_i,X_j)=c_{ij}$: \begin{equation} \mathbf{C}=\begin{pmatrix} \sigma^2&\mathbf{c}^T\\ \mathbf{c}&\mathbf{D}\\ \end{pmatrix}\qquad \mathbf{C}^{-1}=\begin{pmatrix} \sigma^2&\mathbf{c}^T\\ \mathbf{c}&\mathbf{D}\\ \end{pmatrix}^{-1}= \begin{pmatrix} (\sigma^2-\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c})^{-1}& -\sigma^{-2}\mathbf{c}^T(\mathbf{D}-\mathbf{c}^T\mathbf{c})^{-1}\\ -\sigma^{-2}(\mathbf{D}-\mathbf{c}^T\mathbf{c})^{-1}\mathbf{c}& (\mathbf{D}-\mathbf{c}^T\mathbf{c})^{-1}\\ \end{pmatrix}=:\begin{pmatrix} \alpha^2&\beta^T\\ \beta&\delta\\ \end{pmatrix} \end{equation} With extra assumption $\det\mathbf{C}\det\mathbf{D}\neq 0$. Also note that $Var(\mathbf{a}^T\mathbf{X})=\mathbf{a}^T\mathbf{D}\mathbf{a}$. Thus their joint density function is: \begin{equation} f(y,\mathbf{x})=\frac{1}{\sqrt{(2\pi)^{p+1}\det\mathbf{C} }}\exp\left(-\frac{1}{2}\left((y,\mathbf{x})^T\mathbf{C}^{-1}(y,\mathbf{x})\right)\right)=\frac{1}{\sqrt{(2\pi)^{p+1}\det\mathbf{C} }}\exp\left(-\frac{\alpha^2y^2+2\beta^T\mathbf{x}y+\mathbf{x}^T\delta\mathbf{x}}{2}\right)= \end{equation} \begin{equation} =\frac{1}{\sqrt{(2\pi)^{p+1}\det\mathbf{C} }}\exp\left(-\frac{1}{2}\left(\alpha y-\frac{\mathbf{x}^T\beta}{\alpha}\right)^2-\frac{\mathbf{x}^T\delta\mathbf{x}}{2}+\frac{(\mathbf{x}^T\beta)^2}{2\alpha^2}\right) \end{equation} We know from the notes that $\mathbb{E}(Y-g(X_1,\ldots,X_p))^2$ is minimized by: \begin{equation} \mathbb{E}(Y|X_1,\ldots, X_p)=\frac{\displaystyle \int_{-\infty}^{\infty}yf(y,\mathbf{x})\,\mathrm{d}{y}}{\displaystyle \int_{-\infty}^{\infty}f(y,\mathbf{x})\,\mathrm{d}{y}}=\frac{\mathbf{x}^T\beta}{\alpha^2}= \boxed{-(\mathbf{x}^T\sigma^{-2}(\mathbf{D}-\mathbf{c}^T\mathbf{c})^{-1}\mathbf{c})(\sigma^2-\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c})} \end{equation} The denominator: \begin{equation} \int_{-\infty}^{\infty}\frac{1}{\sqrt{(2\pi)^{p+1}\det\mathbf{C} }}\exp\left(-\frac{1}{2}\left(\alpha y-\frac{\mathbf{x}^T\beta}{\alpha}\right)^2-\frac{\mathbf{x}^T\delta\mathbf{x}}{2}+\frac{(\mathbf{x}^T\beta)^2}{2\alpha^2}\right)\,\mathrm{d}{y}=\frac{1}{\sqrt{(2\pi)^{p}\det\mathbf{C} }\alpha}\exp\left(-\frac{\mathbf{x}^T\delta\mathbf{x}}{2}+\frac{(\mathbf{x}^T\beta)^2}{2\alpha^2}\right) \end{equation} The numerator is an expectation of a normal random variable with mean $\displaystyle\frac{\mathbf{x}^T\beta}{\alpha}$: \begin{equation} \int_{-\infty}^{\infty}\frac{y}{\sqrt{(2\pi)^{p+1}\det\mathbf{C}}}\exp\left(-\frac{1}{2}\left(\alpha y-\frac{\mathbf{x}^T\beta}{\alpha}\right)^2-\frac{\mathbf{x}^T\delta\mathbf{x}}{2}+\frac{(\mathbf{x}^T\beta)^2}{2\alpha^2}\right)\,\mathrm{d}{y} =\frac{\mathbf{x}^T\beta}{\sqrt{(2\pi)^{p}\det\mathbf{C} }\alpha^3}\exp\left(-\frac{\mathbf{x}^T\delta\mathbf{x}}{2}+\frac{(\mathbf{x}^T\beta)^2}{2\alpha^2}\right) \end{equation} which proves it is linear. We have to minimize $\displaystyle \mathbb{E}\left(Y-b-\sum_{i=1}^{p}a_iX_i\right)^2$. This is from the exponential family, thus we can switch the order of integration and differentiation: \begin{equation} \mathbb{E}\left(\frac{\partial}{\partial a_i}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=-2c_i+2\sum_{j=1}^{p}a_ic_{ij}\qquad \mathbb{E}\left(\frac{\partial}{\partial b}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=2b \end{equation} \begin{equation} \mathbb{E}\left(\frac{\partial^2}{\partial a_i^2}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=2c_{ii}\qquad \mathbb{E}\left(\frac{\partial^2}{\partial b^2}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=2 \end{equation} \begin{equation} \mathbb{E}\left(\frac{\partial}{\partial a_i \partial a_j}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=2c_{ij}\qquad \mathbb{E}\left(\frac{\partial}{\partial a_i \partial b}\left(Y-b-\sum_{j=1}^{p}a_jX_j\right)^2\right)=0 \end{equation} So the Hessian is positive definite at this point, meaning that the minimum is obtained a when: \begin{equation} \mathbf{D}\mathbf{a}=\mathbf{c}\Rightarrow \boxed{\mathbf{a}=\mathbf{D}^{-1}\mathbf{c}\qquad b=0} \end{equation}
And that it is the maximum
We want to find the maximum in the direction $\mathbf{v}$, so take $t(X_{i=1}^p,u)=(\mathbf{D}^{-1}\mathbf{c}+u\mathbf{v})^T\mathbf{X}$, then using $\mathbf{D}^T=\mathbf{D}$: \begin{equation} {Corr}(Y,t(X_{i=1}^p,u))=\frac{Cov(Y,(\mathbf{D}^{-1}\mathbf{c}+u\mathbf{v})^T\mathbf{X})}{\sqrt{Var Y Var((\mathbf{D}^{-1}\mathbf{c}+u\mathbf{v})^T\mathbf{X})}}= \frac{(\mathbf{D}^{-1}\mathbf{c}+ u\mathbf{v})^T\mathbf{c}} {\sigma\sqrt{\mathbf{c}^T\mathbf{D}^{-1}\mathbf{D}\mathbf{D}^{-1}\mathbf{c}+2u\mathbf{v}^T\mathbf{D}\mathbf{D}^{-1}\mathbf{c}+u^2\mathbf{v}^T\mathbf{D}\mathbf{v}}}= \end{equation} \begin{equation} =\frac{\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}+ u\mathbf{v}^T\mathbf{c}} {\sqrt{\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}+ 2u\mathbf{v}^T\mathbf{c}+ u^2\mathbf{v}^T\mathbf{D}\mathbf{v}}} \end{equation} \begin{equation} \frac{\partial}{\partial u}{Corr}(Y,t(X_{i=1}^p,u))= \frac{u((\mathbf{c}^T\mathbf{v})^2-\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{v}^T\mathbf{D}\mathbf{v})} {\sigma(\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}+2u\mathbf{v}^T\mathbf{c}+u^2\mathbf{v}^T\mathbf{D}\mathbf{v})^{3/2}} \end{equation} With some straightforward and lengthy calculations, we have: \begin{equation} \left.\frac{\partial^2}{\partial u^2}{Corr}(Y,t(X_{i=1}^p,u))\right|_{u=0}=-\frac{\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{v}^T\mathbf{D}\mathbf{v}-(\mathbf{c}^T\mathbf{v})^2}{(\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c})^{3/2}} \end{equation} So we must have $u=0$ or $(\mathbf{v}^T\mathbf{c})^2-\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{v}^T\mathbf{D}\mathbf{v}=0$, meaning that the correlation is constant in the direction of $\mathbf{v}$. Let $\mathbf{v}=\mathbf{v}_\perp+\lambda\mathbf{c}$, where $\mathbf{v}_\perp$ is orthogonal to $\mathbf{c}$, thus: \begin{equation} \mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{v}^T\mathbf{D}\mathbf{v}-(\mathbf{v}\mathbf{c}^T)^2=\lambda^2\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{c}^T\mathbf{D}\mathbf{c}+\lambda\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{c}^T\mathbf{D}\mathbf{v}_\perp-(\lambda\mathbf{c}^T\mathbf{c})^2 \end{equation} If $\mathbf{Y}$ is independent, then the correlation is $0$. If that's not the case, with scaling we can achieve $\textbf{c}^T\textbf{c}=1$. $\mathbf{c}^T\mathbf{D}^{-1}\mathbf{c}\mathbf{c}^T\mathbf{D}\mathbf{c}=(\mathbf{c}^T\mathbf{c})^2$
And there I'm just stuck, any ideas how to continue?