2

I have a small question about the utilization of registers during some basic operations. Effectively, I have already looked at the assembly code produced during operations such as XOR or AND and those are easy to understand. Effectively, if we consider a = b & c, this will be translated in 3 steps :

  1. b is moved to %rax
  2. %rax = %rax + c
  3. %rax is moved to a

Note that a, b and c are unsigned long variables. This translation is also available if the addition is replaced by a XOR or an OR. Anyway, I have checked if it was also the case for shifts and I find a weird thing : in fact, a = b << c is translated as follow

  1. b is moved to %rax
  2. c is moved to %rcx
  3. %rax = %rax <<(shlq) %cl
  4. %rax is moved to a

I'm not pretty sure I really understand the second and third steps. I suppose this is due to the fact that %rax (b) cannot be shifted more than 63, otherwise, the result is obviously 0. It seems that %cl is an 8-bit register and so I think that this is a fast way to select only the useful bits and not the 64 bits in %rcx. Is that correct ?

Thank you

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1382272
  • 189
  • 2
  • 13

3 Answers3

4

That's simply how shl works.
From Intel manual 2B:

Shifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

Variable length shifts must use cl.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Thank you for your answer. Is that right that %cl contains the 6 least significant bits of %rcx ? – user1382272 May 29 '16 at 13:09
  • 1
    `rcx` -> full 64 bit register, `ecx` -> lower 32 bits of the full register, `cx` -> lower 16 bits of the full register, `ch` -> upper 8 bits of `cx`, `cl` -> lower 8 bits of full register. – Margaret Bloom May 29 '16 at 13:13
3

I suppose this is due to the fact that %rax (b) cannot be shifted more than 63, otherwise, the result is obviously 0.

If it worked like that, it probably would have used rcx as the operand (or however wide makes the most sense in any context, ie the operand size of the instruction), to check whether any upper bits are set (and set the result to zero if any are set).

But it doesn't, the shift amount is taken modulo the operand size, so any upper bits are completely irrelevant. So it can read just the low 8 bits, and it also does, though that decision probably made more sense in the 16-bit days (ch would actually be used) than it does now. The newer shrx-family reads a "full" register (as wide as the operand size) and then just ignores more bits.

harold
  • 61,398
  • 6
  • 86
  • 164
  • 1
    Fun fact: 8 and 16bit shifts still mask the shift count with `0x1F` (31), so they *can* shift all the bits out and leave the destination register zeroed regardless of previous contents. – Peter Cordes May 30 '16 at 01:45
  • @PeterCordes oh hey, that's fun. I hadn't really noticed tbh, I never actually use 8 and 16 bit shifts.. Do you think there's a good reason for it or just an other "lol, Intel" thing? – harold May 30 '16 at 08:30
  • Happened to see this again just now: when they introduced masking in 186, some existing 8086 code may have depended on being able to shift out all the bits (of a 16-bit register, which was the widest that existed in 8086). But when widening registers in 386, masking was already defined as part of the ISA, so they could keep the barrel shifter narrower. That's my theory on what the motivation was anyway. (Like I wrote in [Why any modern x86 masks shift count to the 5 low bits in CL](https://stackoverflow.com/a/61779201)) – Peter Cordes Dec 08 '21 at 03:37
1

Well, CL is an 8-bit register with possible values of 0..255. So moving some value to %RCX is only partly relevant, because only the lowest 8-bits(CL) would count. A 64-bit destination register like %RAX can only be shifted 63 bits to the left without overflowing. Shifting it 64 bits or more (up to 255 = max of CL) to the left would always result in 0(zero). So your assumption is correct.

An explanation of the relevant SHL OpCode can be found there.

REX.W + D3 /4     SHL r/m64, CL     Multiply r/m64 by 2, CL times.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • x86 masks the shift count. See the Operation section of the reference manual you linked. This is why [the safe idiom for `rol` in C can use `&` operations without any `AND` instruction showing up in the compiler output.](http://stackoverflow.com/questions/776508/best-practices-for-circular-shift-rotate-operations-in-c) – Peter Cordes May 30 '16 at 01:47
  • 1
    @PeterCordes: Thanks for the addition: I would rather have explained it by referring to [the definition of SHL: `(tempCOUNT ← (COUNT AND countMASK);`](http://www.felixcloutier.com/x86/SAL:SAR:SHL:SHR.html), but yours is interesting, too. – zx485 May 30 '16 at 10:17