0

What advantages and disadvantages does a CISC instruction set have relative to a RISC instruction set with respect to the ease of developing a functioning compiler and the ease of developing a highly optimizing compiler?

Mario
  • 245
  • 2
  • 6

1 Answers1

4

CISC vs. RISC is just the wrong distinction. There are some instruction set design decisions that matter to the compiler writer and others that don't, but this simply isn't one of them.

Variable instruction length vs. fixed instruction length pretty much doesn't matter. Writing an assembler for variable instruction length requires you to do a little more bookkeeping, but once its done you never have to worry about it again.

You either want a lot of general-purpose registers or just one (e.g. an accumulator). Register allocation is hard. So hard it is difficult to even figure out the input parameters you are trying to optimize over (you'd like to minimize the number of spills at runtime, but you usually don't know how often any instruction is going to execute at runtime.) If you have only one register then where you have to spill is pretty obvious, and if you have more registers than you need then a greedy heuristic will get you something pretty close to optimal. When you have only half as many or a third as many registers as you need then good heuristics start to matter a lot.

Registers that are special-purpose (used implicitly in a few instructions) are fine, and registers that are general-purpose (can be used for different things in lots of different instructions) are fine, but trying to make a special-purpose register also general purpose is hard for the compiler. For example the MIPS architecture's jump-and-link instruction uses R31 as the implicit place to store the return address. R31 can also be used as a general purpose register, but the compiler has to "special case" it, so that's usually an optimization that takes a few years to get into the compiler.

Having more than one way to do the same thing is generally not helpful to the compiler writer. The internal representation of the compiler is usually a graph where the nodes correspond to operations and the edges correspond to operands. If there are multiple ways to do something (for example, lots of different addressing modes) then this becomes a graph tiling problem, which is hard. The result is that most compiler writers just ignore any addressing modes beyond the simple ones. Given the way modern microarchitectures work the main cost of not using the more complex addressing modes is that the instruction sequence takes up more space than it would in the optimal case. There's a classic story from the 1980s about the VAX. If I remember correctly, the VAX had a special call instruction that would also push arguments onto the stack. It turned out that the longer instruction sequence that individually pushed the arguments and then did the call separately was faster.

Large constants are a pain, so this is a place where fixed-length instructions hurts rather than helps. Sparc had something like 12 bits for constants, MIPS has 16, which is better (because then you can construct any 32-bit constant with at most two instructions.) The larger the available constant fields are the less likely it is that you have to construct multi-instruction constant construction sequences. (This is even more painful for PC-relative jumps and branches where usually you can fit a relative offset into the constant field, but sometimes not, in which case you need to save the PC in a register and manually add and subtract the appropriate constant. Ick.)

Alignment can be an issue sometimes. For example: some early SIMD instruction sets required that source memory operands be aligned at cache-line boundaries, which is almost impossible for the compiler to guarantee without inserting a bunch of extra code. To be useful a SIMD instruction set needs to allow cache-line-unaligned loads, at least (aligning stores is easier when the loads can be unaligned, so that's less important.)

Wandering Logic
  • 17,863
  • 1
  • 46
  • 87