Is vzeroupper necessary if only XMM registers are used with AVX?

Question

AVX's VEX encoding adds the very convenient three-operand versions of many SSE instructions. If I use these entirely with xmm0-xmm15 and don't try to access the high halves of ymm registers, is it still advisable to use vzeroupper?

The VEX-encoded xmm instructions set the high halves of the ymm registers to zero, unlike the SSE encoding. But does that mean I should use vzeroupper?

The non-SIMD parts of my project are not compiled with /arch:AVX or -mavx, as non-AVX CPUs are still supported.

`vzeroupper` isn't necessary in that case. Look for the "avx 128-bit" arrows on the state-transition diagrams in [Why is this SSE code 6 times slower without VZEROUPPER on Skylake?](https://stackoverflow.com/q/41303780) - if the CPU was in a "clean upper" state, it stays there on executing an instruction like `vpxor xmm15, xmm0, xmm0`. — Peter Cordes, Aug 17 '22 at 22:38
See also [Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?](https://stackoverflow.com/q/49019614) / [First use of AVX 256-bit vectors slows down 128-bit vector and AVX scalar ops](https://stackoverflow.com/q/66874161) - one stray 256-bit instruction could reduce performance for the rest of your program if your code doesn't ever use `vzeroupper`. But if there aren't any you should be fine. And context switches should restore the SIMD state to clean-uppers. — Peter Cordes, Aug 17 '22 at 22:49

Is vzeroupper necessary if only XMM registers are used with AVX?

0 Answers0