Intuitive explanation of why the modified Gram-Schmidt is more stable than the classical one?

Question

This may be an old question, and there are certainly some related posts which I will mention below. However, there seems no clear answer to me yet. The question is: is there an intuitive way to explain why the modified Gram-Schmidt (MGS) process for doing QR factorization of a matrix $A\in\mathbb{C} ^{m\times n}$ gives a $Q$ matrix that is "more orthogonal" than that from the classical Gram-Schmidt (CGS) process? By "intuitive", I hope the explanation can be related to the procedural difference between MGS and CGS in a transparent way.

In Trefethen's Numerical Linear Algebra, the distinction between the CGS and MGS is as below:

At the $j$th step, both GS processes compute $q_j$ as $$ q_j=\frac{P_j a_j }{\|| P_j a_j \|| } $$ while for CGS, $$ P_j=I-Q_{j-1}Q_{j-1}^* $$ but for MGS, $$ P_j=(I-q_{j-1}q_{j-1}^* )...(I-q_2q_2^* )(I-q_1q_1^* ) $$

Trefethen does not discuss why this procedural difference leads to the better numerical stability of MGS.

@AlgebraicPavel has given quantitative bounds here on the orthogonality factors: $\||I-Q^* Q\||\leq O(\epsilon \kappa(A))$ for MGS, while $\||I-Q^* Q\||\leq O(\epsilon \kappa^2(A))$ for CGS. These results are quantitative enough. However, as mentioned above, I would like a more intuitive reasoning for how this comes out.

@Ian said here that:

"Classical Gram-Schmidt, in which you subtract off the projections of the (k+1)th vector onto the first k vectors, is quite unstable, especially in high dimensions, because you essentially ensure that your new vector is orthogonal to the input vector in question but fail to ensure that the vectors you get at the end of the process are orthogonal to each other. Combine that with the fact that you can wind up subtracting nearly equal numbers and you get a bad situation."

This does sound like an intuitive and qualitative explanation for the problem of CGS. However, going into the detail, I don't feel comfortable about this line of reasoning. Specifically, saying that the "new vector is orthogonal to the input vector in question" does not seem to agree with what CGS is doing. For both CGS and MGS, the new vector ($a_j$) is subtracted in an attempt to make it orthogonal to the existing $q_i, i=1,...,j-1$. It might not be proper to call these $q_i$ "input vector", and this does not address the main procedural difference between MGS and CGS.

In this post, the $4\times 3$ Lauchli matrix is used as an example to demo the different results between MGS and CGS. Though there is still no intuitive explanation to the question either, I notice that for this Lauchli example, the result that $q_3^{CGS}$ fails to be orthogonal to $q_2^{CGS}$ is because the $r_{23}^{CGS}$ is wrongly computed, with a relative error of 100%. However, I can't figure out why the MGS procedure can alleviate this problem significantly.

I greatly appreciate any comments.

So you want a clear answer concerning something unclear by definition like "intuition"?! — , Nov 19 '20 at 06:33
@ProfessorVector I am totally ignorant of the question, or I would attempt to tackle it. However, for what its worth, I consider the OP's question outstanding. Similarly, I do not believe that math can be attacked without stretching one's intuition. My analogy relates to trying to learn Swahili. Translating the Swahili sentence into English, thinking of a reply in English and then translating it back is bad. Instead, it is better to think in Swahili, just as African children are taught to do. I think the analogy re attacking Math is apt. ...see next comment — user2661923, Nov 19 '20 at 07:48
Without knowing anything about the subject, I (moderately) suspect that there is an underlying elegant intuition that answers the query. This is what the OP is asking for. — user2661923, Nov 19 '20 at 07:50
@ProfessorVector That was an interesting comment. To put it another way, suppose your are going to introduce this subject to an undergraduate student, how are you going to explain that doing a sequence of projection $(I-q_{j-1}q_{j-1}^* )...(I-q_1 q_1^* )a_j$ will give you better numerical stability than doing them all at once as $(I-Q_{j-1}Q_{j-1}^* )a_j $ ? I believe there is a more accessible answer to this, than simply proving the orthogonality upper bound. — George C, Nov 19 '20 at 13:49
@George C You can't be serious. I'd rather shoot myself than to teach numerical analysis to undergraduates, but if I had to, I'd only present them facts and proofs, not believes. If they have any talent, they'll acquire their intuition, but only after lots of exercise and facts. — , Nov 19 '20 at 20:17
Explaining intuition can be extremely valuable. For example, long ago I read a proof of the multivariable chain rule in baby Rudin. I understood it line by line, but couldn't see how anyone would have thought of it. It was a long time before I understood the essential idea: if $f(x) = g(h(x))$ then $f(x + \Delta x) = g(h(x + \Delta x)) \approx g(h(x) + h'(x) \Delta x) \approx g(h(x)) + g'(h(x)) h'(x) \Delta x$. The chain rule just pops out by using linear approximation twice. If you read Terence Tao's book Analysis II, he explains this intuitive idea explicitly, unlike Rudin. — littleO, Dec 21 '20 at 21:51
@ProfessorVector I mean solving a billion by billion matrix won't sound fun without using something like MATLAB with optimized algorithms :) — B E I R U T, Dec 24 '20 at 01:45
"[what @Ian said] does sound like an intuitive and qualitative explanation for the problem of CGS..." - no, it does not. It sounds like a re-statement of the fact that CGS does not produce the desired result, but not like an explanation of it. — paperskilltrees, Feb 19 '21 at 21:56

rpm2718 · Accepted Answer · 2020-12-24T00:43:06.927

In both CGS and MGS, the orthogonalization step of subtracting off projections onto the columns of $Q$ that have already been computed introduces errors to due to finite-precision arithmetic. Each column $\mathbf{q}_i$ of $Q$ therefore has some error component in the direction of previously-computed columns $\{\mathbf{q}_1….\mathbf{q}_{i-1}\}$. The error accumulates for increasing column number $i$, which is an inherent weakness in both algorithms.

In CGS, the orthogonalization of a column $n$ against column $\mathbf{q}_{i}$ ($i<n$) is performed by projecting the original column of $A$ (call this $\mathbf{a}_n$) onto $\mathbf{q}_{i}$ and subtracting. $$ \begin{split} \mathbf{p}_{n} &\equiv \mathbf{a_n} - \sum_{i=1}^{n-1}(\mathbf{q_i^T}\cdot \mathbf{a_n})\mathbf{q_i} \\ \mathbf{q}_{n} &= \frac{\mathbf{p}_{n}}{\|\mathbf{p}_{n}\|} \end{split} $$ In MGS, on the other hand, the components along each $\mathbf{q}_i$ are immediately subtracted out of rest of the columns to the right of column $i$ as soon as the $\mathbf{q}_i$ are computed. Therefore the orthogonalization of column $n$ against $\mathbf{q}_{i}$ is not performed by projecting $\mathbf{q}_{i}$ against the original column of $A$ as it is in CGS, but rather against a vector obtained by subtracting from that column of $A$ the components in span($\mathbf{q}_1….\mathbf{q}_{i-1}$). This is important because of the error components of $\mathbf{q}_i$, which span $\{\mathbf{q}_1….\mathbf{q}_{i-1}\}$.

More precisely, in MGS the orthogonalization of column $n$ against $\mathbf{q}_{i}$ is performed by subtracting the component of $\mathbf{q}_{i}$ from the vector $\mathbf{v}_n^{i-1}$, where $\mathbf{v}_n^0\equiv \mathbf{a}_n$ and $\mathbf{v}_n^i$ ($0<i<n$) is defined as $$ \begin{split} \mathbf{v}_n^{i}&\equiv \mathbf{v}_n^{i-1} -(\mathbf{q}_{i}^T\cdot \mathbf{v}_n^{i-1})\mathbf{q}_{i}, \\ \mathbf{q}_n &= \frac{\mathbf{v}_n^{n-1}}{\|\mathbf{v}_n^{n-1}\|} \end{split} $$ Notice the difference in the projection factors in parenthesis in the above expression, $(\mathbf{q}_{i}^T\cdot \mathbf{v}_n^{i-1})$, and the corresponding one for CGS, ($\mathbf{q_i^T}\cdot \mathbf{a_n}$). The vector $\mathbf{q}_i$ has error components in span($\mathbf{q}_1….\mathbf{q}_{i-1}$) that will be introduce error into this projection factor. Whereas the vector $\mathbf{a}_n$ can in general have large components in span($\mathbf{q}_1….\mathbf{q}_{i-1}$), the vector $\mathbf{v}_n^{i-1}$ has only error components in span($\mathbf{q}_1….\mathbf{q}_{i-1}$) because in computing $\mathbf{v}_n^{i-1}$ those components of $\mathbf{a}_n$ in span($\mathbf{q}_1….\mathbf{q}_{i-1}$) have already been subtracted off. As a result, the error in this multiplicative factor due to the imperfect orthogonality between $\mathbf{q}_i$ and $\{\mathbf{q}_1...\mathbf{q}_{i-1}\}$ is much smaller in MGS than it is in CGS.

Because of the much smaller error in this projection factor, the MGS introduces less orthogonalization error at each subtraction step than does CGS.

Very well put! It is exactly the explanation I am looking for. I also notice that what you said is exactly what happens in the Lauchli example. — George C, Dec 23 '20 at 18:19
Just a suggestion. In the 2nd equation for MGS, I would write $n$ instead of $i$, to be consistent with the 2nd equation for CGS. — George C, Dec 23 '20 at 18:20
Thanks for the editing suggestion....I have incorporated it. — rpm2718, Dec 24 '20 at 00:43
It would be great if you could also comment on in which formal sense CGS and MGS are unstable or stable, and how to see it. Although it is a slightly different question.
(Also, a minor typo "that will be introduce error" -> "that will introduce error") — paperskilltrees, Feb 19 '21 at 22:43
I suggest elaborate the updating rule for $v^i_n$, everytime a new $q_i$ come out, we get rid of the projections on $q_i$ for every old $v^{i-1}_n$ to be orthogonalized for future updates. This pesudocode is helpful for clarity. — Kuo, Mar 21 '25 at 10:39
@Kuo I find that pseudocode hard to follow since it changes the structure of the program, unnecessarily, it seems to me, making it difficult to pinpoint exactly what's different between classical and modified. I find what is described in this answer (which looks similar to what's in wikipedia) to be much clearer than that. — Don Hatch, May 08 '25 at 06:53

Intuitive explanation of why the modified Gram-Schmidt is more stable than the classical one?

1 Answers1

Linked