How do processors know the order of registers' values?

Question

In assembly one can pass values via less volatile registers or volatile registers. I can for instance pass arguments to printf using edi and esi I can also instead use ebx and ecx This example is a very simple contrived one. I'm more curious to how this works with much more intricate programs calling multiple functions from libc.

For instance in Return Oriented Programming attacks, an attacker can use gadgets to use the same registers used for a previous function to pop new values from the stack into them and then return to another libc function that uses the same register(s), for instance with write and read one could use pop rsi in ROP attacks to use either function if they've leaked the global offset table. My overall question could be asked this way:

If an attacker inherits registers from a previous call to read like so:

    0x00005555555552d0 <+107>:   lea    rcx,[rbp-0xd0] <- Memory address of buffer "msg"
    0x00005555555552d7 <+114>:   mov    eax,DWORD PTR [rbp-0xe4] <- contains client fd 0x4
    0x00005555555552dd <+120>:   mov    edx,0x400 <- 1024 (size of bytes to write to memory location/buffer)
    0x00005555555552e2 <+125>:   mov    rsi,rcx
    0x00005555555552e5 <+128>:   mov    edi,eax
    0x00005555555552e7 <+130>:   call   0x5555555550d0 <read@plt>

How does the processor know which arguments to supply write to if the registers passed to write are different:

    0x00005555555552b1 <+76>:    call   0x555555555080 <strlen@plt>
    0x00005555555552b6 <+81>:    mov    rdx,rax <- store return value from strlen into rdx
    0x00005555555552b9 <+84>:    lea    rcx,[rbp-0xe0] <- message to write
    0x00005555555552c0 <+91>:    mov    eax,DWORD PTR [rbp-0xe4] <- client file descriptor
    0x00005555555552c6 <+97>:    mov    rsi,rcx
    0x00005555555552c9 <+100>:   mov    edi,eax
    0x00005555555552cb <+102>:   call   0x555555555060 <write@plt>

Clearly read does not use rdx and write does not use edx, so how does the processor know which to choose, for example if an attacker only used a gadget that pops a value into rsi?

I can't seem to understand how the processor knows which registers to chose from (rdx or edx). How do processors select values to pass to libc functions or functions/routines for that matter in general?

It's not the processor's choice. It's defined by the calling convention. Note that since `read` and `write` take the same number and type of arguments, they use the same registers. On x86-64 linux those are `rdi`, `rsi` and `rdx` for the `fd`, `buf` and `count` arguments respectively. It's unclear why you think they are different. — Jester, Dec 21 '20 at 02:11
In particular for your `write` example, the `mov eax,DWORD PTR [rbp-0xe4]` can not be the count, since it is transferred to `edi`. It's clearly the file descriptor. The count is already put into `rdx` by earlier code that you did not show. — Jester, Dec 21 '20 at 02:14
So because I overlooked `eax` being moved into `edi` I failed to see that `rcx` is an argument also. This still doesn't explain the difference between `rcx` and `edx`. Both are different registers, so how does the processor know which one to use? — asd40732, Dec 21 '20 at 02:29
`rcx` isn't an argument either (in this case). It's moved to `rsi`. Neither `rax` nor `rcx` are used to pass arguments. They are just temporaries in the code shown, they are moved into the correct argument registers before the function call. — Jester, Dec 21 '20 at 02:30
Yes, the order matters. The order in which they map to arguments, not the order in which you load the values. See [calling convention documentation](https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI) — Jester, Dec 21 '20 at 02:34
@Jester so how does the processor know to choose between `rdx` and `edx`? — asd40732, Dec 21 '20 at 02:57

score 1 · Accepted Answer · answered Dec 21 '20 at 02:51

The processor doesn't know anything; the registers aren't indexable and the only order they have as far as the CPU is concerned are the register numbers used in machine code. (And for stuff like save-multiple-register instructions like legacy 32-bit mode pusha / popa, or xsave to save the FPU / SIMD state.)

What looks for args in certain places in the called function is... more code (software), generated by a compiler that compiled a function with its args declared a certain way. Remember, printf is just more software, not built-in to the CPU.

The compiler knows the standard calling convention for the target platform (defined in the x86-64 System V ABI in this case), so having both caller and callee agree on a calling convention results in calling code that will put args in the places that callees look for them.

Standardizing this calling convention is how we can link together code from different compilers into one program, and make calls into libraries.

BTW, the same goes for making system calls; you put a call number into a certain register and run an instruction that switches to kernel mode (e.g. syscall). Now the kernel is running, and can look at the values still in registers. It uses the call number to index a table of function pointers, calling it with the other args in the standard arg-passing registers. (Or wherever they need to go according to the C calling convention, which is typically different from the system-call calling convention.)

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

I just don't understand how the processor knows to choose `rdx` or `edx`. If you can answer that I will accept your answer. — asd40732, Dec 21 '20 at 02:55
@asd40732: Which choice when are you talking about? When decoding machine code, the difference between `add eax, edx` and `add rax, rdx` is 1 bit in the REX prefix that selects 64-bit operand size (REX.W=1). The dx / edx / rdx have the same register number in machine code, the size is set by the operand-size attribute of the instruction (prefixes and opcode). https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix. — Peter Cordes, Dec 21 '20 at 03:09
@asd40732: If you mean how does the compiler decide which operand-size to use, usually it matches the type width of the C variable. (Writing a 32-bit register implicitly zero-extends to 64-bit, so `mov edx, 1234` is how a compiler would pass a `size_t` or `uint64_t` arg, like the 3rd arg of `write`; [Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?](https://stackoverflow.com/q/11177137)). But that's the compiler making the decision, *not* the processor, as I and old_timer have explained in both our answers. — Peter Cordes, Dec 21 '20 at 03:09
also related: [The advantages of using 32bit registers/instructions in x86-64](https://stackoverflow.com/q/38303333) — Peter Cordes, Dec 21 '20 at 03:11

score 0 · Answer 2 · answered Dec 21 '20 at 02:41

The processor is very dumb it knows nothing. Literally it only does what the instructions say to do and the instructions are ultimately written by the programmer directly or indirectly (via compilation). The compiler knows because of the calling convention decided on by the compiler authors and they are free to choose the convention they want for that target, they do not have to conform to an specific previously defined convention. If they happen to it is their free choice to do so. At the end of the day the compiler authors know and build the compiler around that...The processor does only what it is told it cannot think for itself.

How do processors know the order of registers' values?

2 Answers2