5

Let $D = \{(x_1, y_2), (x_2, y_2), \ldots , (x_n, y_n)\}$ where $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$. One may use linear regression to predict $y$ as $w^Tx$ for some parameter vector $w \in \mathbb{R}^{d}$. Consider matrix $X \in \mathbb{R}^{n\times d}$ with $x_i$ as rows and the vector $y \in \mathbb{R}^n$ as the vector of $y_i$. Given an OLS loss function $$ \arg\min_w \hat{R}(w)=\arg\min_w \sum^n_{i=1}(y_i - w^Tx_i)^2$$

a) Show that for $n < d$ that the OLS loss function does not admit a unique solution

b) Under what assumptions on $X$ does this equation admit a unique solution $w^*$?

i. There exists a unique solution if $n \geq d$ and the columns of $X$ are independent

ii. There exists a unique solution $\iff$ $X^TX$ is invertible

iii. For $n > d$, there will always be a unique solution if $X$ is full rank.

I'm a little unsure how to write a proof on this question. My initial idea is to derive a unique solution under the assumption that $X^TX$ is invertible and use the properties of linear systems to that a) will have $\mathrm{rank}(w^*) < d$ and thus be inconsistent and b) will hold under assumptions ii. Is there a better/more formal way for proving a) and b)?

Jose Avilez
  • 13,432
aye.son
  • 105

2 Answers2

1

Note that $$R (w) = \|y - X w \|_2^2$$ Differentiating with respect to $W$ we get $$\nabla_w R = 2 X^T (y-Xw)$$ Setting the gradient to zero, we get the normal equation $$X^TXw = X^Ty \qquad (\star)$$ Recall that $\mathrm{rank}(X^TX) = \mathrm{rank}(X)$. Thus $X^TX$ is invertible if and only if $X$ is full rank (i.e. it has rank $d$).

Now let me answer your questions.

Show that for $n<d$ that the OLS loss function does not admit a unique solution

If $n < d$, then $\mathrm{rank}(X) \leq n < d$, and so $(X^TX)$ is not invertible. Thus, if the system $(\star )$ has a solution, then in fact it has infinitely many of them. In practice, if this happens, one may use the pseudoinverse of $X^TX$ instead.

b) Under what assumptions on $X$ does this equation admit a unique solution $w^∗$?

This happens when $X^TX$ is invertible, which happens when $X$ has full rank. The conditions (i), (ii), and (iii) are equivalent ways of saying this.

Mittens
  • 46,352
Jose Avilez
  • 13,432
0

Few notes about the uniqueness of the solution. We know that the solution to the optimisation problem solves the normal equation: $$ (X^T X)w = X^T y $$ (as noted by Jose). Let's investigate the uniqueness of the solution $w$ by assuming that a) $w$ isn't unique and b) there are some two solutions $w_1, w_2$ that meet the normal equation and $w_1 \neq w_2$. We have: $$ (X^T X)w_1 = (X^T X)w_2 $$ Now, if $X^TX$ is invertible then this implies $w_1 = w_2$ as $X^T X$ is a bijective mapping (If you think of $X^T X$ as a linear transformation applied to $w$). This is a contradiction so $w$ must be unique if $X^T X$ is invertible.

If $X^T X$ is not invertible then it maps some vectors $v \neq 0$ to $\mathbf{0}$, i.e. the null space $N(X^T X)$ has a dimension of at least one. So adding any vector $v$ from that nullspace to $w$ will be a solution to the normal equation and thus there is not a unique solution.

Regarding the dimensionality of $X^T X$:

If $X$ is of size $n \times d$ then $X^T X$ has a size of $d \times d$. If $n < d$ then $rk(X)$ can be at most $n$. Since we know that $rk(X^T X) = rk(X)$ (proof) then $rk(X^T X)$ can be at most $n$ (which is $ \lt d$) and so it's rank deficient and thus not invertible. Inversely, if $n \geq d$ and $rk(X) = d^{[1]}$ then $X^T X$ is full rank and thus invertible and a unique solution exists.

[1] By the definition of rank this implies there are $d$ independent columns