0

In the linear regression model $Y=X\beta+u$ with $X=(\mathbf1, X_1, ...,X_p) \in\mathbb{R}^{n\times (p+1)}$

define $Y^*=Y-\bar{Y}$ and $X^*=(\mathbf0, X_1-\bar{X_1},...,X_p-\bar{X_p})$

Then \begin{align} Y^*&=Y-\bar{Y}=X\beta +u - 1/n*1_{n\times n}(X\beta+u)\\ &=(\mathbf1,X_1,...,X_p)\beta -(\mathbf1,\bar{X_1},...,\bar{X_p})\beta +u -\bar{u} =X^*\beta+u-\bar{u} \end{align}

So if I regress Y* against $X^*_{-1}\beta_{-1}$ do I get the same $\beta$ except for the intercept of course? And how would I show that?

I would either need to show that \begin{align} (\hat{\beta})_{-1}&=((X^TX)^{-1}X^TY)_{-1}=({X^*_{-1}}^TX^*_{-1})^{-1}{X^*_{-1}}^TY^*= \hat\beta^*\\ (&=Var(X_1,...,X_p)^{-1} (Cov(X_1,Y^*),...,Cov(X_p,Y^*)) \end{align} But I don't see how to resolve that.

Or alternatively maybe try use the fact that they are the solution to the least square optimization problem $\min_\beta\|Y-X\beta\|$ but since the residual u does not stay the same I am not sure how to do that either.

Background: I am trying to understand the proofsketch in the answer to this question: OLS: Omitted variable bias when E(omitted variable) ≠0

  • What is the meaning of $-1$ as a subscript? – callculus42 Aug 10 '18 at 15:51
  • Everything BUT the first column. So dropping the first column or entry. In case of X* you want to drop the column with 0's otherwise you don't have a full rank and can not take the inverse. It basically means dropping the intercept. – Felix Benning Aug 10 '18 at 15:55
  • Isn´t the first columns with only $1´s$?. This is mentioned in the link you provided. – callculus42 Aug 10 '18 at 16:19
  • It is in X, but when you center the columns you end up with the 1's cancelling out. Look at the line where I show how $Y^$ and $X^$ are related. – Felix Benning Aug 10 '18 at 17:01

1 Answers1

0

Part1: Centering Y does not affect $\beta_{-1}$. Proof:

First note this property: $$(X^TX)^{-1}X^T (\mathbf1,X_1,...,X_p)=(X^TX)^{-1}X^TX=\mathbb{I}\quad \Rightarrow (X^TX)^{-1}X^T\mathbf1=\begin{pmatrix}1\\0\\ \vdots\\0\end{pmatrix}=e_1$$

In particular $X(X^TX)^{-1}X^T\mathbf{1}=\mathbf1$ $(*)$ which means:

\begin{align} \hat{\beta}^*=(X^TX)^{-1}X^T(Y-\mathbf1\bar Y)=(X^TX)^{-1}X^TY -(X^TX)^{-1}X^T\mathbf1\bar Y=\hat{\beta}-e_1\bar Y \end{align}


Part 2: Centering X does not affect $\beta_{-1}$

Be $\hat{\beta}=\arg\min_{\beta}\|Y-X\beta\|_2$ if X has full rank this solution exists and is unique given by $(X^TX)^{-1}X^TY$.

Now consider that: \begin{align} \|Y-(\mathbf1,X_1,...,X_p)\beta\|_2&=\|Y- (\beta_0\mathbf1 +\beta_1X_1+...+\beta_pX_p)\|_2 \\ &=\|Y-[(\beta_0+\bar X_1\beta_1+...+\bar X_p\beta_p)\mathbf1 \\ &\quad+ \beta_1(X_1-\bar X_1\mathbf1)+...+\beta_p(X_p-\bar X_p\mathbf1)]\|_2 \end{align}

This means for $\tilde{\beta}:= \begin{pmatrix} \hat\beta_0+\bar X_1\hat\beta_1+...+\bar X_p\hat\beta_p \\ \hat\beta_1 \\ \vdots \\ \hat\beta_p \end{pmatrix}$ and $X^*:=(\mathbf1,X_1-\bar X_1,...,X_p-\bar X_p)$ that $$\|Y-X\hat\beta\|_2=\|Y-X^*\tilde\beta\|_2$$ and this means that $\tilde\beta=\arg\min_{\beta}\|Y-X^*\beta\|_2$ because there can not be a $\beta$ with \begin{align} \|Y-X^*\beta\|_2<\|Y-X^*\tilde\beta\|_2=\|Y-X\hat\beta\|_2 \end{align} Since this $\beta$ could be translated back into a $\beta^*$ such that $\|Y-X\beta^*\|_2<\|Y-X\hat\beta\|_2$ but $\hat\beta$ is a minimum.


Because $\mathbf1$ and $\bar X_i\mathbf1$ are linear dependent. And since the determinant is multilinear X* has thus full rank: \begin{align} \det(X^*)&=\det(\mathbf1,X_1-\bar X_1,...,X_p-\bar X_p) \\ &=\det(\mathbf1,X_1,X_2-\bar X_2,...,X_p-\bar X_p)-\det(\mathbf1,\bar X_1\mathbf1,X_2-\bar X_2,...,X_p-\bar X_p)\\ &=\det(\mathbf1,X_1,X_2-\bar X_2,...,X_p-\bar X_p)\\ &= ... = \det(X) \neq 0 \end{align}


Thus $\tilde\beta=({X^*}^TX^*)^{-1}{X^*}^TY$ is the unique minimum. And $\tilde\beta_{-1}=\hat\beta_{-1}$


Interpretation/Additional notes:

Because of $(*)$ if X and Y are centered $\hat\beta_0=0$. Proof: \begin{align} 0&=\mathbf1^TY=\mathbf1^T X(X^TX)^{-1}X^T Y =\mathbf1^T\hat Y=\mathbf1^T X\hat\beta \\ &=\mathbf1^T (\mathbf1,X_1,...,X_p)\hat\beta=(n,n\bar X_1, ...,n\bar X_p)\hat\beta \\ &=(n,0,...,0)\hat\beta=n\hat\beta_0 \end{align}

Which means that if they are centered $\beta_0$ can be dropped.

In that case $\hat\beta$ can be written as:

\begin{align} \hat\beta=(X^TX)^{-1}X^TY=(\widehat{Cov}(X_i,X_j)_{i,j=1,...,p})^{-1} \begin{pmatrix} \widehat{Cov}(X_1,Y)\\ \vdots \\ \widehat{Cov}(X_p,Y) \end{pmatrix} \end{align}

Otherwise this still represents $\hat\beta_{-1}$