In above picture you can see a AES state array over 2 rounds through the S- and P-Boxes of AES. It's taken from here and intends to show how 2 rounds of transformation are necessary to achieve full diffusion after only the first byte has changed.
If one would MixColumns() before MixRows() diffusion, one would reach diffusion one step earlier. So I wonder what is the reason to ShiftRows() before MixColumns()?