I was reading Least square solution of Linear Regression. Following is the error function:
\begin{equation} J(\theta) = \frac{1}{2}\sum_{i=1}^{n}{\vert{X\theta-y}\vert}^2 \end{equation}
where $\theta \in R^{p\times1}$, $X \in R^{n \times p}$ and $y \in R^{n \times 1}$
After taking dervative of $J(\theta)$ w.r.t $\theta$ and putting it equal to 0, Following is the valur of $\theta$
\begin{equation} \theta = (X^TX)^{-1}X^Ty \end{equation}
Now the solution exists when $(X^TX)^{-1}$ exists.
They say if matrix X has rank $p$ (which means features are independent), then $(X^TX)^{-1}$ exists.
Question is:
How can I say $(X^TX)^{-1}$ exist when X has rank $p$??
My attempt:
\begin{equation} X = \Bigg({\begin{array}{cccc}f_1 & f_2 & f_3 & ...f_p\end{array}}\Bigg) \end{equation} where $f_1,f_2,...f_p$ are p features of data matrix $X$ and are linearly independent as X has rank p.
\begin{equation} (X^TX) = \left({\begin{array}{cccc}{f_1}^Tf_1 & {f_1}^Tf_2 & ... & {f_1}^Tf_p \\ {f_2}^Tf_1 & ... & ....& {f_2}^Tf_p \\ ...& & & \\ ...& ... & ... & {f_p}^Tf_p\end{array}}\right) \end{equation}
How can I use the fact that $f_1,f_2,..f_p$ are linearly independent to prove that $(X^TX)^{-1}$ has rank $p$.