Moving a value of a lesser size into a register

Question

I have stored a one-byte value of 8 and I'd like to move that into the rax register. I'm currently doing this with movzx to zero-extend the byte:

.globl main
main:
    push %rbp
    mov %rsp, %rbp
    movb $8, -1(%rbp)
    movzx -1(%rbp), %rax <-- here
    ...

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long? From here is says, if I'm reading it properly, that it can work on both a byte and a word, but how would it know? For example, if I added a two-byte value at -2(%rbp) how would it know to grab the two-byte value? Is there another instruction where I can just grab a one or two or four byte value at an address and insert it into a 64 bit register?

I suppose another way to do it would be to first zero-out the register and then add it to the 8-bit (or however many bits) component, such as:

mov $0, %rax
mov -1(%rbp), %al

Is there one way that is more preferred than another way?

score 4 · Answer 1 · answered Aug 13 '20 at 06:36

It's ambiguous and relies on some default, you shouldn't write code like that.

That's why AT&T syntax has movzb and movzw instructions (typically used as movzbl -1(%rbp), %eax), for the two different source sizes of the Intel-syntax movzx mnemonic. See Are x86 Assembly Mnemonic standarized? (no, AT&T makes up new names.)

And yes, you could xor %eax,%eax / mov -1(%rbp), %al to merge into the low byte, but that's pointlessly inefficient. x86-64 guarantees the availability of 386 instructions like movzx.

Surprisingly, movzx -1(%rbp), %rax does assemble. If you assemble it, then disassemble back into AT&T syntax with objdump -d foo.o, you get movzbq (byte to quad), including a useless REX prefix instead of letting implicit zero-extension do the job after writing EAX.

48 0f b6 45 ff          movzbq -0x1(%rbp),%rax

Or disassemble into Intel syntax with objdump -drwC -Mintel:

48 0f b6 45 ff          movzx  rax,BYTE PTR [rbp-0x1]

Fun fact: GAS can't infer movzb vs. movzw if you write just movz, because movz isn't an instruction mnemonic. Unlike operand-size suffixes that can be inferred from the operands, the b and w are treated as part of the mnemonic. But you can write movzx and then it will infer both sizes from register operands, just like in Intel-syntax mode.

   5:   0f b6 c0                movzbl %al,%eax         # source: movzx %al, %eax
   8:   0f b7 c0                movzwl %ax,%eax         # source: movzx %ax, %eax

movzw and movzb act like instruction mnemonics in their own right (that can infer a size suffix from the destination register). Semi-related: What does the MOVZBL instruction do in IA-32 AT&T syntax?

Also related: a table of cdq and so on equivalents in terms of movsx and AT&T equivalents: What does cltq do in assembly?

Also related: MOVZX missing 32 bit register to 64 bit register - because that's implicit in writing a 32-bit register.

score 2 · Accepted Answer · answered Aug 13 '20 at 06:23

2

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long?

There are two (or even three) instructions:

movzxb (-1(%rbp) is one byte long) and movzxw (-1(%rbp) is one 16-bit word long).

My assembler interprets movzx as movzxb; however, you should not rely on that!

Better use the instruction name including the source size (movzxb or movzxw) to ensure that the assembler uses the correct instruction.

answered Aug 13 '20 at 06:23

Martin Rosenau

17,897
3
19
38

great, thank you for the clarification. Is there a third instruction that is 32 bits long? – samuelbrody1249 Aug 13 '20 at 06:24
2

@samuelbrody1249 For the `movsx` instruction there is `movsxl` that uses a 32-bit source. `movzxl` **seems** not to exist because any operation writing to the `%eax` register will implicitly set the high 32 bits of `%rax` to zero. So you can simply do a `mov -1(%rbp), %eax` instruction. – Martin Rosenau Aug 13 '20 at 06:27
`movzxb` is a mutant hybrid of Intel and AT&T syntax. I wouldn't recommend it, compilers never use it and I've never seen it mentioned anywhere. It does apparently work as equivalent to standard AT&T `movzb`, though. I was surprised, I expected it to be a destination size override for `movzx` and fail. (Possibly I'm totally mistaken and this is documented or standardized somewhere, but I've never seen it.) – Peter Cordes Aug 13 '20 at 06:38
1

Note that the standard AT&T mnemonics for these are `movzbl` and `movzwl`. It's always `movzXY` for a zero-extending move from size X to size Y and `movsXY` for a sign extending move. – fuz Aug 13 '20 at 15:55

Moving a value of a lesser size into a register

2 Answers2

Linked

Related