I am reading Elements of Statistical Learning and read the following claim from the text (page 93, Chapter 3.7):
Least squares fitting is usually done via the Cholesky decomposition of the matrix $\mathbf{X}^T\mathbf{X}$ or a QR Decomposition of $\mathbf{X}$. With $N$ obserations and $p$ features, the Cholesky decomposition requires $p^3 + Np^2/2$ operations, while the QR decomposition requires $Np^2$ operations.
I understand Cholesky and QR decompositions individually, but I do not understand where this claim came from. Please help.