I use a variation of a 5-cross median filter on image data on a small embedded system, i.e.
x
x x x
x
The algorithm is really simple: read 5 unsigned integer values, get the highest 2, do some calculations on those and write back the unsigned integer result.
What is nice is that the 5 integer input values are all in the range of 0-20. The calculated integer value are also in the 0-20 range!
Through profiling, I have figured out that getting the largest two numbers is the bottleneck so I want to speed this part up. What is the fastest way to perform this selection?
The current algorithm uses a 32 bit mask with 1 in the position given by the 5 numbers and a HW-supported CLZ function.
I should say that the CPU is a proprietary one, not available outside of my company. My compiler is GCC but tailor made for this CPU.
I have tried to figure out if I can use a lookup-table but I have failed to generate a key that I can use.
I have $21^5$ combinations for the input but order isn't important, i.e. [5,0,0,0,5] is the same as [5,5,0,0,0].
It happens that the hash-function below produces a perfect hash without collisions!
def hash(x):
h = 0
for i in x:
h = 33*h+i
return h
But the hash is huge and there is simply not enough memory to use that.
Is there a better algorithm that I can use? Is it possible to solve my problem using a lookup-table and generating a key?