Why do benchmark results vary at all?

Question

Benchmarking typically consists of getting the current CPU time, executing test code a large number of times, and then subtracting the new CPU time from the previous one. However when you benchmark code multiple times, the results tend to vary either slightly to significantly each time. You will very likely not get the exact same number each time even for the same code and same number of iterations.

However, CPUs have a fixed clock rate (do they not?), and instructions typically take the same amount of time to execute for the same conditions (add, mov, or, etc. typically take a fixed integer number of clock cycles or do they not?), and the same instructions are being executed the same number of times under seemingly the same conditions.

So in theory every benchmark should return the exact same number, yet clearly they do not. What goes on behind the scenes in a CPU that could cause variations in benchmark results for the exact same benchmark?

score 6 · Answer 1 · answered Mar 24 '24 at 07:55

Your assumption that the clock rate is constant is wrong. My computer at home can at any time switch each core to one of 15 different clock speeds.

Your assumption that the same operation always takes the same time is wrong. The total processor state is extremely complicated and not exactly reproducible. The benchmark runs on a computer that does other things that can and will affect the benchmark times.

Rinkesh P · Accepted Answer · 2025-06-01T03:08:42.020

Simply put, benchmark would return the same, exact number if the cpu is in the same state every time the code is benchmarked.

What does cpu state mean? It means the context of the cpu at that instance of time, which would include registers, cache, pipeline and much more. To add on, the OS would demand context switches, paging operations and more to keep running. It would be practically impossible to reproduce the same scenario for each different run. These constitute, what I would like to call, the logical state of cpu.The benchmarks won't give the same number but would follow a distribution.

The cpu being a semiconductor device, is bound ultimately by the laws of physics and hence it's actual physical state and environment matters. Cpu's are engineered to prolong it's life and one such feature is dynamic frequency scaling.

Just as food for thought, benchmarking vacuum tube based systems/punch card based systems(from back in the day) would have a higher chance of giving the similar benchmark result(why?)

Why do benchmark results vary at all?

2 Answers2