4

I'm programming a JIT compiler and I've been surprised to discover that so many of the x86-64 registers are nonvolatile (callee-preserved) in the Win64 calling convention. It seems to me that nonvolatile registers just amount to more work in all functions that could use these registers. This seems especially true in the case of numeric computations where you'd want to use many registers in a leaf function, say some kind of highly optimized matrix multiplication. However, only 6 of the 16 SSE registers are volatile, for example, so you'd have a lot of spilling to do if you need to use more than that.

So yeah, I don't get it. What's the tradeoff here?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Trillian
  • 6,207
  • 1
  • 26
  • 36
  • 3
    That's not the way it works, you generate your own machine code so you set your own rules. You only need to observe the x64 abi when you call external code. Which requires a custom marshaller anyway. – Hans Passant May 01 '12 at 04:00
  • @HansPassant Hum yeah, I didn't think about that. This is my first such project and I want to call into external code, so it's just simpler for me to use Win64 everywhere. But I understand I could do otherwise. – Trillian May 01 '12 at 13:00
  • @HansPassant Does that mean that say within the Linux kernel, it can choose to override these rules as long as it sticks to whatever it has chosen to do internally(as they are not cast in stone or enforeced by hardware in any way)? – AjB Dec 25 '13 at 09:16

4 Answers4

5

If registers are caller-saves, then the caller always has to save or reload those registers around a function call. But if registers are callee-saves, then the callee only has to save the registers that it uses, and only when it knows they're going to be used (i.e. maybe not at all in an early-exit scenario). The disadvantage of this convention is that the callee doesn't have knowledge of the caller, so it might be saving registers that are dead anyway, but I guess that's seen as a smaller concern.

hobbs
  • 223,387
  • 19
  • 210
  • 288
  • So as soon as a single one of your callees uses no nonvolatile register, you've saved some spilling/loading. I guess that makes sense. Thanks. – Trillian May 01 '12 at 12:56
  • @Trillian: You always want a mix of call-preserved and call-clobbered registers, so leaf functions can get stuff done without any save/restore, and for scratch space in non-leaf functions for values that don't have to survive function calls. But yes you always want some call-preserved regs so a few values can stay in regs across calls. – Peter Cordes May 09 '22 at 18:22
2

The Windows x86-64 calling convention with only 6 call-clobbered xmm registers is not a very good design, you're right. Most SIMD (and many scalar FP) loops don't contain any function calls, so they gain nothing from having their data in call-preserved registers. The save/restore is pure downside because it's rare than any of their callers are making use of this non-volatile state.

In x86-64 System V, all the vector registers are call-clobbered, which is maybe too far the other way. Having 1 or 2 call-preserved would be nice in many cases, especially for code that makes some math library function calls. (Use gcc -fno-math-errno to let simple ones inline better; sometimes the only reason they don't is that they need to set errno on NaN.)

Related: how the x86-64 SysV calling convention was chosen: looking at code size and instruction count for gcc compiling SPECint/SPECfp.


For integer regs, having some of each is definitely good, and all "normal" calling conventions (for all architectures, not just x86) do in fact have a mix. This reduces the total amount of work done spilling/restoring in callers and callees combined.

Forcing the caller to spill/reload everything around every function call is not good for code-size or performance. Saving / restoring some call-preserved regs at the start/end of the function lets non-leaf functions keep some things live in registers across calls.

Consider some code that calculates a couple things and then does cout << "result: " << a << "foo" << b*c << '\n'; That's 4 function calls to std::ostream operator<<, and they generally don't inline. Keeping the address of cout and the locals you just computed in non-volatile registers means you only need some cheap mov reg,reg instructions to set up the args for the next call. (Or push in a stack-args calling convention).

But having some call-clobbered registers that can be used without saving is also very important. Functions that don't need all the architectural registers can just use the call-clobbered registers as temporaries. This avoids introducing a spill/reload into the critical path for the caller's dependency chains (for very small callees), as well as saving instructions.

Sometimes a complex function will save/restore some call-preserved registers just to get more total registers (like you're seeing with XMM for number crunching). This is generally worth it; saving/restoring the caller's non-volatile registers is usually better than spilling/reloading your own local variables to the stack, especially not if you would have to do that inside any loop.


Another reason for call-clobbered registers is that usually some of your values are "dead" after a function call: you only needed them as args to the function. Computing them in call-clobbered registers means you don't have to save/restore anything to free up those registers, but also that your callee can also freely use them. This is even better in calling conventions that pass args in registers: you can compute your inputs directly in the arg-passing registers. (And copy any to call-preserved regs or spill them to stack memory if you also need them after the function.)

(I like the terms call-preserved vs. call-clobbered, rather than caller-saved vs. callee-saved. The latter terms imply that someone must save the registers, instead of just letting dead values die. volatile / non-volatile is not bad, but those terms also have other technical meanings as C keywords, or in terms of flash vs. DRAM.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

A callee only needs to save/restore callee saved (nonvolatile, call-preserved) registers that it needs to change the value of momentarily (some of which might not be used by any caller in the stack chain / stacktrace, but the callee doesn't know this), and a caller only needs to save/restore caller saved (volatile, call-clobbered) registers that it needs after the call (which a callee in the future stack chain might not actually modify anyway, but the caller doesn't know this).

Typically, at least on Microsoft x64 calling convention, you will see a lot of explicitly saved nonvolatile registers on the stack but not explicitly saved volatile registers -- I think the idea is that compilers never get to the stage where a caller needs to explicitly save a register right before a call, particularly an expression that isn't a variable in the program in itself; instead, it can plan ahead and avoid using those registers entirely, use the register but not optimise the variable backing store off of the stack, use the registers for parameters passed to a callee function that are dead after calling a callee function because they aren't defined as variables in the program, or use a volatile register.

A callee explicitly pushes any nonvolatile register it needs to keep modified around a call that it makes to the stack in the function prologue and restores them in the epilogue. It can save them in volatile registers but must restore them to the nonvolatile register or save them to the stack (in which case the save/store is called a spill) if the callee function makes a call itself, and it cant store it in another non-volatile register because then that register will need to be saved as well.

I agree that caller saved implies that register needs to be saved regardless of whether the caller uses it or not. This isn't true, and may not even have to save the register even if it does use the register, because it knows that it does not need it after the call, or might not make a call at all.

It is good to have an even balance. It is only a disadvantage to have all of one and none of the other, but sometimes it might be optimal to have a bias towards one type, for example nonvolatile, where that register might be predominantly used in a callee function and not in a caller function, like Peter suggested with xmm registers.

I think having all nonvolatile registers would hurt more than having all of them volatile registers, because you'd be saving parameters that might be dead in the caller after the call (which is why parameters are volatile; furthermore, preserving the return value register is impossible, so you'd have to have at least one volatile register for that or return values on the stack, which is slower), and you also would not be able to modify a register momentarily without saving the value to the stack because there are only nonvolatile registers available, whereas if they were all volatile registers, you'd be able to store values in registers until a call is made or if there is no call at all. There will always be a caller function (unless it's the base frame), but there are far more leaf functions than base frames, and the base frame would have to not adhere to the calling convention in order to optimise out the saving of nonvolatile registers, and might not optimise them out if adhering to it strictly, whereas a leaf function not saving volatile registers is defined in the calling convention.

If all the registers were volatile, this is still a disadvantage because nonvolatile registers can make it easier for compiling your own application, because the burden is on the callee function, which might be in some library compiled separately. Furthermore, seeing as all volatile registers are saved when making a trap frame and not nonvolatile registers (this is the case on Microsoft x64 calling convention at least, unless there is an exception or a context switch), there will be more time / space penalty for regular system calls if all the registers were volatile.

Lewis Kelsey
  • 4,129
  • 1
  • 32
  • 42
-1

The advantage of having nonvolatile registers is: performance.

The less data is moved, the more efficient a CPU is.

The more volatile registers, the more energy does the CPU need.

ncomputers
  • 3,680
  • 1
  • 15
  • 16
  • If this were true in general, calling conventions would have no volatile registers for perfect efficiency. You need some of each to minimize instruction could and energy use, otherwise you need to move data around to free up some scratch registers for functions to use. – Peter Cordes Sep 06 '22 at 13:47