Why isn't everything SSE (or "SIMD") by default?

Question

I don't know much about the internal working of the CPU, and my understanding of SSE is equally basic; it works in the form of additional long registers that pack some number of data types you want a perform a single operation on (in parallel) using a single instruction.

Great, but why isn't every register and every operation like that by default? If I want to add two integers, why would I need to place each in two separate registers and do the operation through multiple instructions when I could just do it through SSE? does it interfere with concurrency somehow? is it a hardware limitation?

Thanks! If there are somewhat easy to follow sources as well, I would gladly appreciate it

If you only have two number to add, putting them into the same SIMD register wouldn't be useful. `phaddd xmm0,xmm0` exists in SSSE3, but is slow (costs the same as 2 shuffles + a normal [vertical `paddd`](https://www.felixcloutier.com/x86/paddb:paddw:paddd:paddq). https://uops.info/.) — Peter Cordes, Jun 10 '21 at 01:19
Also, SIMD vector registers on most ISAs can't be used as memory addresses (except for gather / scatter instructions), or for compare/branch directly. Of course, that's only because there are separate integer registers instead of having only one kind of register. Your question really boils down to why that's the case. (Note that current CPUs usually combine scalar floating point with SIMD; the answers on the linked duplicate Q&As do discuss SIMD.) A major consideration is power efficiency for code that can't effectively make use of SIMD. Scalar integer regs and execution units are smaller. — Peter Cordes, Jun 10 '21 at 01:21

Why isn't everything SSE (or "SIMD") by default?

0 Answers0