The question is quite simple: for a $N \times p$ matrix $\mathbf{X}$ with real entries, when is $\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I}$ invertible (where $\mathbf{I}$ is the $p \times p$ identity matrix and $\lambda > 0$)?
This comes up in ridge regression. In Elements of Statistical Learning (Hastie et al.),
[The equation] adds a positive constant to the diagonal of $\mathbf{X}^{T}\mathbf{X}$ before inversion. This makes the problem nonsingular, even if $\mathbf{X}^{T}\mathbf{X}$ is not of full rank.
I know that $\mathbf{X}^{T}\mathbf{X}$ is invertible if and only if it is of full rank if and only if $\mathbf{X}$ is of full column rank. The explanation is quite intuitive, but how do I prove it?