I have an already SIMD compared __m128i register, which results in like something:
0, 0, -1, -1, 0, 0, 0, 0 // in shorts
0, -1, 0, 0 // in ints
What is the fastest/cheapest way to get the position of the int where the bits are set? There is only one int inside of the __m128i set to 1.
Example:
-1, -1, 0, 0, 0, 0, 0, 0 -> 0
0, 0, -1, -1, 0, 0, 0, 0 -> 1
0, 0, 0, 0, -1, -1, 0, 0 -> 2
0, 0, 0, 0, 0, 0, -1, -1 -> 3
One additional note, I have only AVX and lower available, so no AVX2 or AVX-512. I'm using C++ and Intel instrincs.
Edit: This is my current code:
__m128i comparableLow = _mm_set_epi32(key - 1, key - 1, key - 1, key - 1);
__m128i comparableHigh = _mm_set_epi32(key + 1, key + 1, key + 1, key + 1);
__m128i mData = _mm_loadu_si128((__m128i*)(arr));
__m128i l1 = _mm_cmpgt_epi32(mData, comparableLow);
__m128i u1 = _mm_cmplt_epi32(mData, comparableHigh);
__m128i r1 = _mm_and_si128(u1, l1);