Is it necessary or easier to code x86 assembly by following the purpose of each General Purpose register

Question

In general, Is it necessary or easier to code x86 assembly by following the purpose of each register?

The registers in x86 architecture were each first designed to have a special purpose, but compilers modernly doesn't seems to care their usage(unless under some special condition such as REP MOV or MUL).

so, will it be easier or more optimize to code depend on the purpose of each registers?(regardless of the special instructions(or encoding) that are identical to some register)

For instance(I could use REP MOVSB or LODSB STOSB instead, but just to demonstrate):

1st Code:

LEA ESI,[AddressOfSomething]
LEA EDI,[AddressOfSomethingElse]
MOV ECX,NUMBER_OF_LOOP
LoopHere:
MOV AL,[ESI]
ADD AL,8
MOV [EDI],AL
ADD ESI,1
ADD EDI,1
CMP AL,0
JNZ LoopHere
TheEnd:
;...

2nd Code:

LEA ECX,[AddressOfSomething]
LEA EDX,[AddressOfSomethingElse]
MOV EBX,NUMBER_OF_LOOP
LoopHere:
MOV AL,[ECX]
ADD AL,8
MOV [EDX],AL
ADD ECX,1
ADD EDX,1
CMP AL,0
JNZ LoopHere
TheEnd:
;...

The Compiler I used--Visual Studio 2015 usually uses the 2nd method when doing tasks such as this, it doesn't use registers depend on its' purpose, instead, the compiler only choose what register to use based on its' "volatile" or "non-volatile" characteristic(after calling a function). Because of this, all the high-level-programming-language programmed software disassembly use the 2nd method.

Another interesting fact is that in ARM language, the GPRs all serves the same purpose, and are named R0-R7, which means that when code with it, the code will be more similar to 2nd code.

All in all, my opinion is that these two codes uses the same instructions, therefore it should have same speed regardless of what register I used. But am I correct? and which code is easier to code with?

The x86 CPU internally long ago gave up any specialization for its registers. There is vestigial specialization in the instruction set, but it does not translate into specialization inside the CPU. — Raymond Chen, Sep 22 '16 at 05:19
According to [Wikipedia](https://en.wikipedia.org/wiki/X86#Purpose), there are some benefits: *For example, using AL as an accumulator and adding an immediate byte value to it produces the efficient add to AL opcode of 04h, whilst using the BL register produces the generic and longer add to register opcode of 80C3h.* And (weirdly), some people (like that same wiki page) refer to ESP as a "general-purpose" register, so yeah, sometimes using a general register the way you are "supposed" to is a good idea. Mostly tho, I'm with Raymond. This is a non-issue. Pick your registers to avoid push/pop. — David Wohlferd, Sep 22 '16 at 05:48

score 5 · Answer 1 · edited May 23 '17 at 12:15

5

Following the purpose of each register primarily achieves:

Code density

For example using the A register¹ usually reduces the code size for common operations like moving, arithmetic, logic and IO².
Using the C register for counting let you exploit the jcxz family of instructions, avoiding an explicit compare.
movsd and similar are very "dense" instructions they perform complex operations that otherwise would require a lot of code.

However code density doesn't mean "faster" due to the fact that x86 is blatantly CISC, a complex instruction can take more time to execute than an equivalent series of simpler instructions³.
Readibility

An instruction like rep movsd effectively is an "high level" way of coding a cycle that moves data from a source to a destination.
Parsing the cycle
```
push eax
pushf
.loop:
  mov eax, DWORD [esi]
  mov DWORD [es:edi], eax

  add esi, 4*(1-D*2)
  add edi, 4*(1-D*2)

  dec ecx
  jnz .loop
popf
pop eax
```
is a lot more difficult.
Idiomatic programming

The use of SP as a stack pointer is assumed by a lot of instructions (call, ret, push, ...).
It is possible to avoid using SP as a stack pointer, but it wouldn't be very idiomatic (nor efficient).
Less data moving

In real mode, only a few registers could be used as a base (one of them being the B register).
Keeping addresses in B from the beginning would avoid moving them into it later. Though register-register moves don't need an execution unit today, they make the source harder to read⁴.

Most of the idiomatic register usages have been relaxed today⁵ because too much specific purpose registers reduce the optimisations a compiler can do (and spilling onto the stack is expensive).

CPUs are very complex, if you want to write code for speed then you should consider speed metrics only. Idiomatic register usage is not one of them, for one thing that there is not a single A, B or C register at the micro-architecture level so "registers" as the programmer see them is only a human concept (well, and a front-end concept).

¹ In its forms AL, AX, EAX, RAX
² mov A, [mem] uses the opcodes A0 or A1, while mov B, [mem] uses 8A 1E or 8B 1E. The same is true for add and similar. in, out, div, mul enforce the use of A.
³ But not to fetch and decode.
⁴ Is there an equivalent of "Spaghetti code" for data moves into registers?
⁵ Consider for example the various addressing mode or the imul instruction

edited May 23 '17 at 12:15

Community

1
1

answered Sep 22 '16 at 09:03

Margaret Bloom

41,768
5
78
124

If you write assembler code that is in the end called by some C or C++ code then you may consider to avoid specifying the exact CPU registers at all. If you use asm { ... } inside C++ together with the GNU compiler chain and leaving the exact registers open will allow the C(++) compiler to generate different versions of the assembler code for better inlining and global optimization. See for example: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html – Georg Bisseling Sep 22 '16 at 09:34
I'd say "possible to avoid SP as the stack pointer", not "perfectly". Leaving off the "perfectly" is enough to cover the fact that this (like a red zone) is only viable for user-space, where the value of RSP isn't used for interrupts. – Peter Cordes Sep 22 '16 at 10:23
`Is there an equivalent of "Spaghetti code" for data moves into registers?` It's called "compiler output" :P. Even when it would be just as efficient to keep one variable in the same register consistently, compilers don't even try. BTW, reg-reg moves aren't actually free. Not sure what you mean by "not really moves today", but extra reg-reg moves can contribute to a bottleneck on instruction / uop throughput, even on Intel IvB and later where they're (usually) handled during issue/rename, with no execution unit. – Peter Cordes Sep 22 '16 at 10:28
1

Fun fact: AMD64 System V uses RDI,RSI as the first two integer arg-passing registers *because* that lines up `memcpy(dst,src, count)` with `rep movs` (with just a `mov rcx, rdx` for the 3rd arg). Other (non-library) functions may use `(dst,src)` arg order, and may have an inline `rep movs`. Apparently Jan Hubicka found it made a measurable difference to static and dynamic insn count in when compiling SPECint. [see this answer for links to interesting mailing list archive messages about the AMD64 ABI design decisions](http://stackoverflow.com/a/35619528/224132) – Peter Cordes Sep 22 '16 at 10:34
Thank you guys. I'm actually wondering the ARM assembly side of coding thing, when no GP registers are specified for a purpose. – J.Smith Sep 26 '16 at 18:19
@PeterCordes Peter, Do you code in assembly following the GP register's purpose? – J.Smith Sep 28 '16 at 00:40
1

@J.Smith: Only when it doesn't cost any extra MOV instructions. If there's *anything* to be gained by keeping a source pointer in RBX instead of RSI, I'll do it. But sure, until I start coding and find something that favours a different register allocation, I'll put a source pointer in RSI. I don't care about the historical purpose of any registers other than RSP, RDI, and RSI, though. If I can take advantage of the one-byte shorter encoding for ADD EAX, imm32 for anything, then I use RAX. If you need to use something as a shift count, it has to be in RCX (without BMI2's SHLX insn). – Peter Cordes Sep 28 '16 at 05:49
@PeterCordes Thank you:), but will you put address into EAX, ECX, EDX registers? – J.Smith Sep 28 '16 at 19:08
@J.Smith: Yes, of course. Why would you not put an address into one of those three? In 32-bit code, they're the only registers you can use without saving/restoring, so they're your first choice for scratch registers for any purpose. – Peter Cordes Oct 01 '16 at 15:56
@PeterCordes Thank you so much:-), I get the concept now. – J.Smith Oct 03 '16 at 18:09

Is it necessary or easier to code x86 assembly by following the purpose of each General Purpose register

1 Answers1