Let's say I am iterating over a large data set and depending on a user-supplied variable I either do a case-sensitive or case-insensitive sort. I would think that because this user-supplied value never changes, it would be a good idea to put it into a register, for example:
int main(int argc, char * argv[])
{
clock_t t0, t1;
int sort = 1;
t0 = clock();
register int case_insensitive_sort = sort;
int z = 0;
for (int i=0; i < 1e8; i++) {
if (case_insensitive_sort) {
z += 3; // for debugging to see where it's going
} else {
z -= 5;
}
};
t1 = clock();
printf("The function took %fs to complete.\n", ((double)(t1-t0))/CLOCKS_PER_SEC);
t0 = clock();
int case_insensitive_sort2 = sort;
z = 0;
for (int i=0; i < 1e8; i++) {
if (case_insensitive_sort2) {
z += 3; // for debugging to see where it's going
} else {
z -= 5;
}
};
t1 = clock();
printf("The function took %fs to complete.\n", ((double)(t1-t0))/CLOCKS_PER_SEC);
return 0;
}
Here is an example of the compiler output -- https://godbolt.org/z/7KrGzr. It seems with the register prefix there is a comparison within the register to see if it's 0 or not:
testl %ebx, %ebx
And without it it compares against a memory address:
cmpl $0, -12(%rbp)
Yet when I run this locally the version without register is much faster:
The function took 0.255494s to complete.
The function took 0.188364s to complete.
Why is that the case? I thought using the register instead of doing a memory compare would be much faster.
Update: thanks for all the help on this. From Peter's suggestions and linked answers the biggest improvement is not from register-ing the case_insensitive_sort, but by doing it on the loop vars:
register int case_insensitive_sort = sort;
register int z = 0;
for (register int i=0; i < 1e8; i++) { ...
And from this I get the improvement between the two of:
The function took 0.255494s to complete with the register (before).
The function took 0.038112s to complete with the register (after).
The function took 0.252963s to complete without the registers.