0

If I move a value from a 32 bit register say:

    movq %rdx,%rax
    movl %edx,%eax

Will the value stored in %rax get clobbered?

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • Yes, although not for the reason you think. You could have just tried it, too. – Jester Apr 26 '17 at 00:26
  • Yes, of course. It is the same register, no matter if you address it using 8/16/32/64 bit mode. Here's a nice diagram: http://stackoverflow.com/a/228367/14955 – Thilo Apr 26 '17 at 00:26
  • @Jester "not for the reason you think." Please elaborate. – Thilo Apr 26 '17 at 00:29
  • 4
    @Thilo He has moved `rdx` into `rax`, then `edx` into `eax`. That would normally not change anything if it weren't for the fact that 32 bit ops zero out the top bits. – Jester Apr 26 '17 at 00:53
  • See http://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register – Cody Gray - on strike Apr 26 '17 at 08:01

1 Answers1

3

Yes,

Your code:

mov rax,rdx
mov eax,edx

Will perform the following actions.

rax <= rdx
high 32 bits of rax <= 0, low 32 bits of rax <= edx.

Assigning a 32 bit register will zero out the higher part of that register.

Contrast this to:

mov rax,rdx   :  rax <= rdx
mov ax,dx     :  High 48 bits of rax is unchanged!, low 16 bits of rax <= dx

The same goes for the byte registers.

The reason 32 bit assigns zero out the upper part of a 64 bit register is that it prevents partial register updates, which causes delays in the instruction pipeline.

Using 16 bit code in 32 or 64 bit mode causes delays in the following scenario:

mov rax,-1       //1 cycle
mov ax,dx        //1 cycle
                 //stall, the values of rax-upper and ax need to be combined
mov r8,rax       //2 cycles

A better option would be

mov rax,-1       //1 cycle
movzx eax,dx     //runs concurrent with previous instruction, 0 cycles
mov r8,rax       //1 cycle

//Total 2 cycles, twice as fast.

This code is not equivalent to the sample above it, but that's the whole point. You should avoid partial register updates when possible. Also note that movzx eax,dx is equivalent to movzx rax,dx for the reasons stated above. On x64 it is one byte shorter and therefore the preferred form.

Note that I'm not using ATT syntax, as a matter of principle

Johan
  • 74,508
  • 24
  • 191
  • 319
  • 1
    The "better option" is indeed faster, but not equivalent to the original delay-causing code. The `movzx` makes the initial `mov` completely redundant. Also, that `movzx` could be more efficiently written as `movzx eax, dx` for the reasons given in the first half of this answer. – Cody Gray - on strike Apr 26 '17 at 08:03