I'm using cub::DeviceScan functiona and the sample code snippet has a parameter temp_storage_bytes, which it uses to allocate memory (which, incidentally, the code snippet never frees).
The code snippet calls cub::DeviceScan functions with a pointer to NULL memory which triggers it to calculate the required amount of temporary device memory needed for the function, and then returns. The necessary temporary memory is allocated with cudaMalloc, and the function call is repeated pointing to this memory. The temporary memory is then freed with cudaFree (or probably should be).
I'm doing many repetitions of the device scan on different float arrays, but each float array is identical length.
My question is, can I assume that temp_storage_bytes will always be the same value? If so, I can then do a single cudaMalloc and a single cudaFree for many function calls.
The example is unclear on how the required memory is determined and whether it can change for a given array of a given length.