Why does TEST with register operands run slower than CMP with memory and immediate operands?

Question

Let's say I am iterating over a large data set and depending on a user-supplied variable I either do a case-sensitive or case-insensitive sort. I would think that because this user-supplied value never changes, it would be a good idea to put it into a register, for example:

int main(int argc, char * argv[])
{

    clock_t t0, t1;
    int sort = 1;

    t0 = clock();
    register int case_insensitive_sort = sort;
    int z = 0;
    for (int i=0; i < 1e8; i++) {
        if (case_insensitive_sort) {
            z += 3; // for debugging to see where it's going
        } else {
            z -= 5;
        }
    };
    t1 = clock();
    printf("The function took %fs to complete.\n", ((double)(t1-t0))/CLOCKS_PER_SEC);

    t0 = clock();
    int case_insensitive_sort2 = sort;
    z = 0;
    for (int i=0; i < 1e8; i++) {
        if (case_insensitive_sort2) {
            z += 3; // for debugging to see where it's going
        } else {
            z -= 5;
        }
    };
    t1 = clock();
    printf("The function took %fs to complete.\n", ((double)(t1-t0))/CLOCKS_PER_SEC);
    return 0;

}

Here is an example of the compiler output -- https://godbolt.org/z/7KrGzr. It seems with the register prefix there is a comparison within the register to see if it's 0 or not:

testl   %ebx, %ebx

And without it it compares against a memory address:

cmpl    $0, -12(%rbp)

Yet when I run this locally the version without register is much faster:

The function took 0.255494s to complete.

The function took 0.188364s to complete.

Why is that the case? I thought using the register instead of doing a memory compare would be much faster.

Update: thanks for all the help on this. From Peter's suggestions and linked answers the biggest improvement is not from register-ing the case_insensitive_sort, but by doing it on the loop vars:

register int case_insensitive_sort = sort;
register int z = 0;
for (register int i=0; i < 1e8; i++) { ...

And from this I get the improvement between the two of:

The function took 0.255494s to complete with the register (before).
The function took 0.038112s to complete with the register (after).
The function took 0.252963s to complete without the registers.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/224008/discussion-on-question-by-david542-why-does-test-with-register-operands-run-slow). — Machavity, Nov 02 '20 at 23:16

0___________ · Answer 1 · 2020-11-01T21:47:09.507

-1

doing any micro-optimizations without enabling the compiler optimizations makes no sense at all.
With optimizations on compiler ignores the register keyword
Your test does not test anything.

When you enable compiler optimizations the generated code is exactly the same. https://godbolt.org/z/Tfjn9x

edited Nov 01 '20 at 21:47

answered Nov 01 '20 at 21:42

0___________

60,014
4
34
74

4

Yes yes yes, but this does not answer the question of why is the `test ebx, ebx` slower than the other that does memory reference – Antti Haapala -- Слава Україні Nov 01 '20 at 21:49

Why does TEST with register operands run slower than CMP with memory and immediate operands?

1 Answers1