I am aware of the fact that this question was already asked here before: Why use Gauss Jordan Elimination instead of Gaussian Elimination, Differences Difference between Gauss and Gauss-Jordan?
Nevertheless, I am not really satisfied with the answers. I would like to clearly see the differences and advantages (stability, complexity ...) of both algorithms from the numerics point of view, i.e. if you really want to implement it in the computer.
People tend to explain the difference by saying that Gauss is the reduction to the row echelon form, while Gauss–Jordan goes to the reduced row echelon form. When explaining the actual elimination, a lot of people would tell you that, in each step, you should normalize the corresponding row, so that the first nonzero number is 1. But this does not make much sense since, in this case, Gauss and Gauss–Jordan are the same algorithm as you are doing literally the same computations.
So, I think that the point is that you are not supposed to normalize the rows. In general, you are never allowed to multiply/divide a row by a number. Then, the two algorithms are different since, in case of Gauss–Jordan, all the divisions take place at the very end, while, in case of Gauss, they take place throughout the whole back substitution. Also, Gauss uses less operations.
My first question: Is this true?
Now, on Wikipedia, it says that
He [Wilhelm Jordan] is remembered among mathematicians for the Gauss–Jordan elimination algorithm, with Jordan improving the stability of the algorithm so it could be applied to minimizing the squared error in the sum of a series of surveying observations.
Questions: So, is the difference I described above really the crux of the modification introduced by Jordan? And so does Gauss–Jordan have some advantages over Gauss in some special cases? Why exactly? Is it because it is numerically more stable to save all the division to the end?
By the way, I find the way of explaining the difference between Gauss and Gauss–Jordan as REF vs RREF really stupid, because the back substitution used with Gauss can also be formulated in terms of row operations, so this definitely is not the point.
EDIT: Or is it that in case of Gauss--Jordan you are allowed to normalize your rows? Or is it that in case of Gauss--Jordan you are not supposed to first zero all entries under the diagonal and only afterwards above it, but you must do everything right away? I am still confused.