2025-01-06

PRNGs and the Birthday paradox

Ever since seeing Casey Muratori use xorshift32 to speed up his ray-casting code I’ve used that algorithm to generate random numbers while working in C. As the name suggests the internal state is a 32 bit number and the result of the function is just it’s internal state. From the perspective of safety and doing anything exposed to a wider audience where the cracks can be exploited for gain the algorithm is sub-par. I’ve stuck with this algorithm because I don’t need anything more and it has multiple niceties. The code size is minimal, we are talking 8 lines. Performance is really high, it takes in the vicinity of 10 instructions to compute a random number. Comparing that to rand() from libc it’s multiple times if not orders of magnitude faster.

Surfing the web as the cool kids used to say during a weekend I found this blog post by Fabien Sanglard. It talks about generating a unique stream of numbers that never repeat in a given range which was used in Wolfenstein 3D for the death animation. Any code running on a CPU from 1992 can’t really waste CPU cycles so generating a lot of high quality random data and tracking if a number has already been picked was out of the question. The solution used is a Linear Feedback Shift Register (LFSR). The driving idea is that you have some state, you advance the state using the current state and output some number. What’s neat is that the way state is advanced looks “random”. If you construct your LFSR in the correct way it can go through all the possible values of the register before it cycles. For example, having a 32 bit state allows you to generate all the values from 1-4294967295. Advancing a state equal to zero will result in zero creating an infinite loop.

What struck me while reading Wikipedia is that xorshift is just a variation of LFSRs. So what follows is that xorshift also has to cycle. It might sound stupid but up until this point I’ve never thought about that fact. I understood that the space is only 32bit big but only after reading about LFSRs I internalised the fact that once all numbers where generated it was going to repeat the exact same way. To make it more obvious let me state a fact: all¹ computer instructions are deterministic. Creating a function $f(x)$ that returns what looks like random numbers is deterministic because the instructions that make up this function are deterministic. Putting the function into a loop like $x^{\prime} = f(x)$ will always create a loop. The simplest example is $f(x) = x + 1$, $x$ will cycle once it hits the biggest representable number by the register. The amount of steps required to cycle is anything from 0 to the number of unique values in the register.

Combining these two facts together we can observe a problem: incorrectly used PRNGs break the birthday paradox. Let’s introduce a new PRNG that cycles after 10 values and it returns values in range 1 - 10. Running it to generate 10 numbers we get: 5, 8, 9, 2, 4, 1, 10, 7, 3, 6. See anything weird about it? There are no repeats! We would expect to see around 4.5 repeats! You can verify that using a really high quality random source like random.org you almost always get some numbers repeating. On the first run I’ve got: 2, 3, 4, 3, 8, 6, 3, 6, 3, 8. There are more repeats than unique numbers.

You can of course rightly point out that if we increase the range it won’t be as drastic. Let’s take xorshift32, the algorithm that I’ve used a lot under the scope. It generates $2^{32} - 1$ values before it cycles with the output in the range $n = 1–2^{32} - 1$. Using the following formula: $$p = \frac{m(m - 1)}{2n}$$

We can calculate that while generating $m = 100000$ numbers we should see a repeat at least once. Again, the cyclic nature does not allow that to happen. If you’re even more astute you might point out that most of the time you don’t take the entire output of the random number generator. You want it in some range so you’ll take the result modulo something. This observation is crucial to solving the problem of no repeats.

Generators have internal state of a given bit size, let’s imagine an prng8 with 8 bits. Calling prng8() will result in a value between 1 - 255, calling it again a new value is going to be generated but it won’t be a repeat of the value from the first call. Cycling will happen after advancing the state 255 times. Let’s do this again but now we take the result modulo 128 which is the same as shadowing the top bit. This result will be in range 0-127 but the most important thing is that it will cycle after 255 states. We have twice as many possible states² that will need to map to our result domain. Now repeats are possible! We have two possible states that generate a given value like 42. Let’s try to go deeper and see if that solves the problem. Imagine that we generate the following run of numbers: 42, 12, 42, 42. Is it possible from this generator? Since only two states map to 42 but we see three the answer is no. Why is this a problem? Because the probability of generating 42 depends on what was generated before! Ideal PRNGs should seem stateless from the outside.

Let’s approach this from a different angle and increase the internal state size and create a prng9() with 9 bits of internal state. If we continue taking the result modulo 128 this will now shadow the top 2 bits. As a result we have four times³ as many possible states that map to the same domain. It’s twice as much. Linear increase of the shadow bits exponentially grows the number of states that maps to the same output value. Exploiting exponential growth is really easy and solves a lot of our problems. A generator like prng24_8 has 1-4294967295 internal states but maps those to only 256 values. Since we have 16777216 states that generate a value of 42 we might as well assume that the generator is stateless. We can have repeats on repeats that seem to obey the birthday paradox.

Math blabber

There is of course a caveat that this is just an approximation. If we have an internal state of the PRNG ($s$) and we advance it like $s^{\prime} = R(s)$. The new state generates an output value $y = M(s^{\prime})$. Thus decreasing the number of states that map to $y$ by one. Generator is now unbalanced, it’s more favoured to generate a value different then $y$ in this cycle. This is because having $\gamma$ shadow bits we have $2^{\gamma}$ states that produce $y$. The chance to generate $y$ on a fresh state is depended on the $\lambda$ output bits. Ideally it should be $1 \over 2^{\lambda}$. But actually it’s ${2^{\gamma} - K} \over 2^{\gamma + \lambda}$ where $K$ is the number of times $y$ has been generated previously. You can notice how increasing $\gamma$ so the number of shadow bits makes this error rate as close to zero as possible. Of course only practically, mathematically speaking no number is too big. In the real world generating $2^{64}$ of anything is next to impossible.

What to use?

Personally, I’m going to be switching to xorshift64. I know that there are better PRNG out there like xorshiro or variants of PCG. With all my heart I recommend reading up on them here and here. For me it’s enough to simply switch to the 64 bit version of the PRNG I’ve used already. There is no performance penalty as all the programming I do is on 64 bit machines so it’s basically free. If I’d be doing something where taking the internal state and doing modulo on it is not enough I’d look into variants like xorshift+ or xorshift*.

excuse this simplification, i know that some instructions like rdrand are not deterministic ↩︎
the zeroth state is underrepresented, only a single state 0x80 will map to zero ↩︎
again, the zeroth state is underrepresented ↩︎