I have a __mmask64 mask register and I need to chop it into 4 __mmask16 mask registers.
I (incorrectly) assumed that the following line of code would have done the trick:
__mmask16 mask_16 = static_cast<__mmask16>(mask_64 >> 16);
But I get (Intel c++ compiler 18.0):
kmovq r14,k1
shr r15,10h
kmovw k2,r15d
Since the Intel Intrinsics Guide does not have a something like _mm512_kshift(k, imm8) and the definition for example _mm512_kand is just:
#define _mm512_kand(k1, k2) ((__mmask16) ((k1) & (k2)))
I assumed shifting would have given me a KSHIFTRW.
Question: How to generate a KSHIFTRW with C++.
Edit: I just found a related question with a sufficient answer: Missing AVX-512 intrinsics for masks?