This may be an old question, and there are certainly some related posts which I will mention below. However, there seems no clear answer to me yet. The question is: is there an intuitive way to explain why the modified Gram-Schmidt (MGS) process for doing QR factorization of a matrix $A\in\mathbb{C} ^{m\times n}$ gives a $Q$ matrix that is "more orthogonal" than that from the classical Gram-Schmidt (CGS) process? By "intuitive", I hope the explanation can be related to the procedural difference between MGS and CGS in a transparent way.
In Trefethen's Numerical Linear Algebra, the distinction between the CGS and MGS is as below:
At the $j$th step, both GS processes compute $q_j$ as $$ q_j=\frac{P_j a_j }{\|| P_j a_j \|| } $$ while for CGS, $$ P_j=I-Q_{j-1}Q_{j-1}^* $$ but for MGS, $$ P_j=(I-q_{j-1}q_{j-1}^* )...(I-q_2q_2^* )(I-q_1q_1^* ) $$
Trefethen does not discuss why this procedural difference leads to the better numerical stability of MGS.
@AlgebraicPavel has given quantitative bounds here on the orthogonality factors: $\||I-Q^* Q\||\leq O(\epsilon \kappa(A))$ for MGS, while $\||I-Q^* Q\||\leq O(\epsilon \kappa^2(A))$ for CGS. These results are quantitative enough. However, as mentioned above, I would like a more intuitive reasoning for how this comes out.
@Ian said here that:
"Classical Gram-Schmidt, in which you subtract off the projections of the (k+1)th vector onto the first k vectors, is quite unstable, especially in high dimensions, because you essentially ensure that your new vector is orthogonal to the input vector in question but fail to ensure that the vectors you get at the end of the process are orthogonal to each other. Combine that with the fact that you can wind up subtracting nearly equal numbers and you get a bad situation."
This does sound like an intuitive and qualitative explanation for the problem of CGS. However, going into the detail, I don't feel comfortable about this line of reasoning. Specifically, saying that the "new vector is orthogonal to the input vector in question" does not seem to agree with what CGS is doing. For both CGS and MGS, the new vector ($a_j$) is subtracted in an attempt to make it orthogonal to the existing $q_i, i=1,...,j-1$. It might not be proper to call these $q_i$ "input vector", and this does not address the main procedural difference between MGS and CGS.
In this post, the $4\times 3$ Lauchli matrix is used as an example to demo the different results between MGS and CGS. Though there is still no intuitive explanation to the question either, I notice that for this Lauchli example, the result that $q_3^{CGS}$ fails to be orthogonal to $q_2^{CGS}$ is because the $r_{23}^{CGS}$ is wrongly computed, with a relative error of 100%. However, I can't figure out why the MGS procedure can alleviate this problem significantly.
I greatly appreciate any comments.