What's the best way to remember the x86-64 System V arg register order?

Question

I often forget the registers that I need to use for each argument in a syscall, and everytime I forget I just visit this question.

The right order for integer/pointer args to x86_64 user-space function calls is:
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. (with variadic functions taking AL = the number of FP args, up to 8)

Or for system calls, %rax (syscall call number), and same args except %r10 instead of %rcx.

What's the best way to remember these registers instead of google this question every time?

score 9 · Answer 1 · edited Mar 03 '22 at 11:25

If you remember C memcpy's arg order, and how rep movsb works, that's most of the way to remembering x86-64 System V.

The design makes memcpy(dst, src, size) cheap to implement with rep movsb, except leaving RCX unused in more functions because it's needed for variable-count shifts more often than anything needs RDX.

Then R8 and R9 are the first two "high" registers. Using them requires a REX prefix, which costs an extra byte of code size in instructions that wouldn't otherwise need one. Thus they're a sensible choice for the last 2 args. (Windows x64 makes the same choice of using R8, R9 for the last 2 register args).

The actual design process involved minimizing a cost tradeoff of instruction count and code-size for compiling something (perhaps SPECcpu) with a then-current AMD64 port of GCC. I don't know whether inlining memcpy as rep movsb was relevant, or whether glibc at the time actually implemented it that way, or what.

My answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? cites some sources for the calling convention design decisions. (Early x86-64.org mailing list posts from GCC devs, notably Jan Hubicka who experimented with a few register orders before coming up with this one.)

Of particular note for remembering the RDX, RCX part of the order is this quote:

We are trying to avoid RCX early in the sequence, since it is register used commonly for special purposes, like EAX, so it has same purpose to be missing in the sequence. Also it can't be used for syscalls and we would like to make syscall sequence to match function call sequence as much as possible.

User-space vs. syscall difference:

R10 replaces RCX in the system call convention because the syscall instruction itself destroys RCX (using it to save RIP, avoiding using the user-space stack, and it can't use the kernel stack because it leaves stack switching up to software). Like how it uses R11 to save RFLAGS.

Keeping it as similar as possible allows libc wrappers to just mov %rcx, %r10, not shuffle over multiple args to fill the gap. R10 is the next available register after R8 and R9.

Alternative: a mnemonic:

`Di`ane's `si`lk `d`ress `c`osts $`89`

(Suggested by the CS:APP blog)

Why were r8 and r9 even chosen if they require an extra byte to represent? Shouldn't you make *all* the 6 registers use one less byte to represent if possible? — rayaantaneja, Oct 21 '22 at 12:02
@rayaantaneja: Most functions don't have 6 args, so it's a tradeoff. There are only 7 registers (other than the stack pointer) which don't need a REX prefix to access. Using 2 of those (RBP and RBX) as call-preserved is better, to allow smaller code in loops that call functions, if any of the loop variables don't need 64-bit operand-size (which would require a REX prefix for any register). Leaving RAX unused for arg-passing also makes some sense, so it can be a scratch register immediately in non-variadic functions. (Variadic have to check AL before saving XMM args if there might be any). — Peter Cordes, Oct 21 '22 at 12:13
If I understand correctly, it's because there aren't any non-REX prefix registers left to use because they're being used for another particular/special purpose. Am I right? Also, now I'm wondering why they chose 6 arguments to be passed into registers and not 7 or 8. Was the choice of 6 random or was there a reason for it being the "optimal" choice? — rayaantaneja, Oct 21 '22 at 12:39
@rayaantaneja: Right, the only call-clobbered register that doesn't have any use in the calling convention is R11. The rest are either call-preserved (and it's very useful to have some of those to avoid spill/reload in non-leaf functions that make multiple calls, e.g. in loops), or already used for arg-passing. A couple stack args aren't a disaster. It wasn't random, though; Jan Hubicka did some static analysis (instruction counting) on GCC output in 2001 or so when designing it, see the link in this answer. (My answer and comments on that other Q&A mentions efficiency). — Peter Cordes, Oct 21 '22 at 12:48
@rayaantaneja: See also [Why not store function parameters in XMM vector registers?](https://stackoverflow.com/q/33707228) discussing the tradeoffs of a calling convention. Also [What are callee and caller saved registers?](https://stackoverflow.com/a/56178078) - you want a mix of both in a good convention. And re: R11, it's useful for there to be a register that a "trampoline" or wrapper function can definitely use as a scratch without interfering with args or return values. — Peter Cordes, Oct 21 '22 at 12:50

What's the best way to remember the x86-64 System V arg register order?

1 Answers1

User-space vs. syscall difference:

Alternative: a mnemonic:

`Di`ane's `si`lk `d`ress `c`osts $`89`

Linked

Related

What's the best way to remember the x86-64 System V arg register order?

1 Answers1

User-space vs. syscall difference:

Alternative: a mnemonic:

Diane's silk dress costs $89

Linked

Related

`Di`ane's `si`lk `d`ress `c`osts $`89`