Microbenchmarking A Sorting Algorithm #

While micro-benchmarking a soring algorithm, I wanted to reduce variation to the minimum. First, I needed to first prepare several arrays that can then be sorted one after another.

Let u(k, s) be the kth number generated by a uniform pseudo random number generator that starts with seed s. The test data is

datasets = [[u(k, s(l)) for k in range(n_samples)] for l in range(n_retries)]

Now if s(l) is different for each retry, then the sequences are (with high probability) unique. Clearly, that’s “bad”, because it adds variation. One could just sort the same data many times by making s(l) some constant. Clever!

The interesting twist is that when one repeats the measurements for arrays of different length. Suddenly, the sorting algo is much faster than before, but only for small datasets.

It seemed like the drop was too much to be simply due to different memory access patterns. Since the data is random, the branching pattern is random. I’ve no clue how the branch predictor in my laptop works; but what if it really can memorize roughly a thousand random choices?

Luckily this is something we can measure using hardware counters. There’s numerous options. Today LIKWID wins, since it compiles and has a nice snippet with exactly our usecase.

So, yes, it seems like my laptop can memorize up to about a thousand random choices.