Quake III & the reciprocal square root
A marveleous algorithm for a fast computation of the reciprocal square root
In the C source code of the famous game “Quake III — Arena” you can find an incredibly efficient piece of code that seems totally insane and incomprehensible! It is the code for computing the reciprocal of the square root of a 32-bit floating point number x. It is usually also known as Fast Inverse Square Root, but I personally prefer to use the term reciprocal as it is more mathematically correct.
Before going into the details of that algorithm and why it is so important, I will write it down immediately as you can find it in the open source repository here.
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
Why such a function?
Quake III — Arena was the first videogame making heavy use of 3D computer graphics, and a common operation is to normalise a 3-dimensional vector.
being the Euclidean norm of the vector
The above code was released publicly on 1999, and it was originally attributed to John Carmack (the father of Quake, Doom, and other famous videogames), but he denied being the author. We still do not know exactly who invented it, but an interesting paper from 1997 called “Floating-Point Tricks” [1], whose author was J. F. Blinn, contains many overlapping points. Probably his version of the function was refined and specialised by other programmers to become what is known today.
IMPORTANT NOTE: Nowadays, processors have built-in instructions for reciprocal square root, so this fast reciprocal square root is much slower and less accurate than what a modern CPU can provide!
IEEE-754 reminder
Because the algorithm is deeply based on binary representation of floating point numbers, we will review them as described in the IEEE-754 standard. We will refer to 32-bit numbers but you can easily generalize to 64-bit too.
The idea behind the IEEE-754 for representing floating point numbers is the same behind the scientific notation of a number, i.e. we can write whatever number in a normalized notation in a base b as:
with the digits d running from 0 to b-1. In a binary system we have
The 32 bits for representing x are laid out as follow
So 1 bit is used for the sign, 8 bits are used for the exponent and 23 bits are used for the mantissa (keep in mind we do not need any bit for storing the 1 of the above formula). We can write x in base 10 as:
Note that the exponent is biased, i.e. an offset of 127 is applied. This is needed to represent both tiny and huge values.
A useful approximation trick
Let’s remind the following useful approximation, valid for small values of x:
Being μ a constant value such that the average error in the [0,1] range is minimized; if μ=0 we are fully exact in x=0 and x=1 (considering the logarithm in base 2), while we will have bigger errors in the middle.
Let’s forget from now on of the sign bit because we will deal only with positive numbers. So, if we interpret the bits of our floating point number as representing an integer number, we can write:
So our goal to find the approximation for
Can be translated in the following operations:
Using the same approximation of the rhs also for the lhs we can write:
And considering a value
We get the magic number 0x5F3759DF for a 32-bit floaitng point system. If you are curious about the derivation of such number have a look to the thesis of Matthew Robertson [2].
So the integer representation of the approximated value we want to get is in relationship with the integer representation of the input number:
Now it is possible to understand the following line of codes
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
The input number is taken and interpreted as an integer, so we can divide it by 2 using a bit shift operation and taking into account the magic number, and finally we can reinterpret back such integer as a float.
The Newton’s Iteration
For fully understanding the code snippet above we have to refresh what is the Newton’s iteration method for finding a zero of a function f(x).
Finding a zero of a function f(x) is equivalent to find where the curve y= f(x) intersect the x axis. Let’s assume we have an approximation point of the zero we are looking for, so we can write the tangent line in such point as:
And the next approximation of the root is taken as the point where the tangent line intesect the x axis, i.e.
So if we want to refine the approximation obtained thanks to the log trick we can perform at least one Newton’s iteration for the function:
whose derivative is given by
So the Newton’s iteration is:
and making the corresponding substitutions and algebric rearrangments, we have:
So we can now understand the rest of the Quake code snippet
const float threehalfs = 1.5F;
x2 = number * 0.5F;y = y * ( threehalfs - ( x2 * y * y ) );
Behind this function there is a concentration of computer science and mathematics!
I hope you enjoyed it as much as I did.
References
[1] — J. F. Blinn, Floating-point tricks, IEEE Computer Graphics and Applications 17 (1997), no. 4.
[2] — M. Robertson, A Brief Hystory of InvSqrt (April 24, 2012). https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf