0

I have stored a one-byte value of 8 and I'd like to move that into the rax register. I'm currently doing this with movzx to zero-extend the byte:

.globl main
main:
    push %rbp
    mov %rsp, %rbp
    movb $8, -1(%rbp)
    movzx -1(%rbp), %rax <-- here
    ...

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long? From here is says, if I'm reading it properly, that it can work on both a byte and a word, but how would it know? For example, if I added a two-byte value at -2(%rbp) how would it know to grab the two-byte value? Is there another instruction where I can just grab a one or two or four byte value at an address and insert it into a 64 bit register?

I suppose another way to do it would be to first zero-out the register and then add it to the 8-bit (or however many bits) component, such as:

mov $0, %rax
mov -1(%rbp), %al

Is there one way that is more preferred than another way?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58

2 Answers2

4

It's ambiguous and relies on some default, you shouldn't write code like that.

That's why AT&T syntax has movzb and movzw instructions (typically used as movzbl -1(%rbp), %eax), for the two different source sizes of the Intel-syntax movzx mnemonic. See Are x86 Assembly Mnemonic standarized? (no, AT&T makes up new names.)

And yes, you could xor %eax,%eax / mov -1(%rbp), %al to merge into the low byte, but that's pointlessly inefficient. x86-64 guarantees the availability of 386 instructions like movzx.

Surprisingly, movzx -1(%rbp), %rax does assemble. If you assemble it, then disassemble back into AT&T syntax with objdump -d foo.o, you get movzbq (byte to quad), including a useless REX prefix instead of letting implicit zero-extension do the job after writing EAX.

48 0f b6 45 ff          movzbq -0x1(%rbp),%rax

Or disassemble into Intel syntax with objdump -drwC -Mintel:

48 0f b6 45 ff          movzx  rax,BYTE PTR [rbp-0x1]

Fun fact: GAS can't infer movzb vs. movzw if you write just movz, because movz isn't an instruction mnemonic. Unlike operand-size suffixes that can be inferred from the operands, the b and w are treated as part of the mnemonic. But you can write movzx and then it will infer both sizes from register operands, just like in Intel-syntax mode.

   5:   0f b6 c0                movzbl %al,%eax         # source: movzx %al, %eax
   8:   0f b7 c0                movzwl %ax,%eax         # source: movzx %ax, %eax

movzw and movzb act like instruction mnemonics in their own right (that can infer a size suffix from the destination register). Semi-related: What does the MOVZBL instruction do in IA-32 AT&T syntax?

Also related: a table of cdq and so on equivalents in terms of movsx and AT&T equivalents: What does cltq do in assembly?

Also related: MOVZX missing 32 bit register to 64 bit register - because that's implicit in writing a 32-bit register.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long?

There are two (or even three) instructions:

movzxb (-1(%rbp) is one byte long) and movzxw (-1(%rbp) is one 16-bit word long).

My assembler interprets movzx as movzxb; however, you should not rely on that!

Better use the instruction name including the source size (movzxb or movzxw) to ensure that the assembler uses the correct instruction.

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • great, thank you for the clarification. Is there a third instruction that is 32 bits long? – samuelbrody1249 Aug 13 '20 at 06:24
  • 2
    @samuelbrody1249 For the `movsx` instruction there is `movsxl` that uses a 32-bit source. `movzxl` **seems** not to exist because any operation writing to the `%eax` register will implicitly set the high 32 bits of `%rax` to zero. So you can simply do a `mov -1(%rbp), %eax` instruction. – Martin Rosenau Aug 13 '20 at 06:27
  • `movzxb` is a mutant hybrid of Intel and AT&T syntax. I wouldn't recommend it, compilers never use it and I've never seen it mentioned anywhere. It does apparently work as equivalent to standard AT&T `movzb`, though. I was surprised, I expected it to be a destination size override for `movzx` and fail. (Possibly I'm totally mistaken and this is documented or standardized somewhere, but I've never seen it.) – Peter Cordes Aug 13 '20 at 06:38
  • 1
    Note that the standard AT&T mnemonics for these are `movzbl` and `movzwl`. It's always `movzXY` for a zero-extending move from size X to size Y and `movsXY` for a sign extending move. – fuz Aug 13 '20 at 15:55