Multiply two integers. Isn’t it simple?

--

When repeating multiplications with your child leads you to discover many interesting algorithms

Having an 8-year-old child it is natural to ask him what he is learning in school. As the new school year has just begun, the math teacher is repeating the four basic operations for the benefit of the whole class. So far so good until I see that there is a full page of multiplication in several digits, and immediately I wondered myself how I could save time by solving them as quickly as possible. this initiates the discovery of a compelling story about increasingly efficient algorithms for multiplying two integers. Who would have thought that I would continue to learn anything advanced from simple multiplication? Never underestimate even simple problems, they are a source of wonderful surprises!

Let’s start by showing a simplified historical timeline which I’ll use to show the several algorithms trying to explain how they work.

In the above picture you can see the main milestones at which a crucial step in improving the performances of multiplying two integers was achieved. The Big O notation is used to provide a uniform framework of comparison for the performance of the different algorithms.

Let’s start with the method all of us learn as pupils at the primary school. Indeed it is called grad-school method or carrying method given the mechanics of the operations. If we have two integers having n digits, what we do is multiplying each digit of a number for all the digits of the other number (it requires O(n²) multiplications), then we sum the n resulting numbers properly shifted doing O(n²) additions. This is the reason for which the grad-school method needs O(n²) operations.

In the 1960 at the Moscow University the great mathematician Kolgomorov conjectured during a seminar that the complexity of the grad-school multiplication algorithm is asymptotically optimal. But Karatsuba, a 23-year-old student attending the seminar, came back few days later with an algorithm that disproved the conjecture and performed better! His result was published two years later in the 1962. [1]

The key idea of Karatsuba was to apply a technique today known in computer science as Divide-and-Conquer, i.e. dividing a problem in smaller similar sub-problems and recursively iterate in similar way until reaching a trivial step allowing to solve the overall problem. For the multiplication of two integers X and Y, we can write them splitting their digits in a given base B to a power m, as:

And that means we can write

Until this stage we didn’t gain nothing, but here the smart Karatsuba entered the game, showing that we can write XY using only 3 and not 4 multiplications paying a bit of extra cost of addittions and subtractions:

So applying recursively this approach to the three new multiplications, with numbers having less digits, we get the result.

The complexity of the algorithm, as shown in the title of the section, can be derived easily applying the master theorem for divide and conquer recurrences considering that the time complexity can be written by the following recursive expression:

In 1963 Toom published its result about computing two large integers improving on Karatsuba algorithm [2], and few years later in 1966 Cook refined the algorithm in his PhD thesis [3].

The Toom-Cook algorithm is similar to the Karatsuba one, indeed it is also a Divide-and-Conquer strategy. But instead of splitting the original number in 2 parts, it splits the number in k parts, so we speak about Toom-Cook-k. Usually the most used variant is the Toom-Cook-3. The key idea of the algorithm is to write two polynomials p(x) and q(x) both having degree (k-1) and with coefficients the k parts the two given numbers M, and N. The splitting is done in a given base B, so we can write for the case k=3 (the generalization is trivial):

and it implies that p(B)=M and q(B)=N. Based on that we can compute r(x)=p(x)q(x), so r(B)=p(B)q(B)=M*N, i.e. what we were looking for. The schoolbook multiplication of the two polynomials will require 9 multiplications, while the Toom-Cook-3 only 5, and this is the root of the better performance, indeed the recurvise expression of the time complexity can be written in a gerenal way as:

Thanks again to the master theorem, we find that the Toom-Cook algorithm performance is

Now let’s go deeper into the algorithm to understand how it is possible to compute r(x), i.e. how to compute all the (k-1)+(k-1)+1 coefficients of r(x), based on the coefficients of p(x) and q(x). From now on I’ll refer to the Toom-Cook-3 case, so k=3. The main idea to keep in mind is that a polynomial of degree d is totally defined once we have its value in d+1 points, so in our example being r(x) of degree 4, we will need 5 points. We chose x = {-2, -1, 0, 1, infinity} and compute p(x) and q(x) for each of them:

Note that these two sets of relations can be written in matrix form as following, where each row of the matrix contains powers of one of the evaluation points, and the vector contains the coefficients of the polynomial:

In that way for computing the 5 values for p(x) we need 6 additions and 2 multiplications, but Bodrato [4] found a way to reduce a bit the number of operations to 5 additions and 1 multiplication by doing the following chain of operations:

At this point we can compute the polynomial r(x) in the same 5 points as:

At this step we entered the recursive procedure 5 times, and finally we have to merge the results. This is the most tricky step because we have to do the reverse of the operations we did in computing p(x) and q(x) at the chosen 5 points, i.e. we have to solve the following linear system:

so by inverting the matrix we get:

And again it is possible to minimize the number of operations here by carefully chaining a sequence of operations as the ones defined by Bodrato.

Once the coefficients of the polynomial r(x) are found we can immediately compute the product of the initial numbers by summing them as:

A real performance breakthrough in computing big integer multiplication happened in 1971 thanks to Schönhage and Strassen [5]. Altough their algorithm (abbreviated as SSA) is no longer the most efficient, it has been the champion one for more than 30 years, and it is still the most used one simply because its computer implementation is much easier than the more performant algorithms described below. As a general point it is also worth to keep in mind that the SSA algorithm starts outperforming Karatsuba and Toom-Cook algorithms only when the two integers have more than 10.000 digits!

The key idea of the SSA is the usage of the Fast Fourier Transform. Let’s have a look to the steps of the algorithm without entering in the details for which you can refer to [5] and [6]. We start remembering that any integer number can be written in positional systems as a proper sum with respect a given base, usually we work in base 10 but we can use whatever base Q. It means we can write two integers a and b as:

We are working assuming, without losing any generality, that both a and b have K digits. Now we define two polynomials A(x) and B(x) having as coefficients the digits of a and b in the base Q:

If we evaluate such polynomials for x = Q we get exactly the two numbers a and b, so their product can be written as ab = A(Q)B(Q). Such product is equal to the one we can get after doing a modulo operation like the following:

If we now multiply the two polynomials we get a new one of degree 2K-2:

So, setting N = MK with M>2/K, we can write our desidered multiplication in the following way:

Our goal is to efficiently find all the coefficients of C(x). Simply doing the naive multiplication of the two polynomials we have:

But looking carefully to such coefficients we realize that their differences can be written in a more compact way as:

And here the magic! Such special multiplication between a and b is exactly what is called the convolution of the two vectors (more precisely we should speak about circular or cyclic convolution). Thanks to the Fourier theory we know by the Convolution Theorem that the Fourier transform of a convolution of two signals is the pointwise product of their Fourier transforms.

And thanks to such theorem we can immediately compute the needed convolution of the two given vectors:

And finally we arrived to the point where we can lower the time complexity because the Fourier transform and its inverse can be efficiently evaluated in O(N logN) thanks to the Fast Fourier Transform (FFT) algorithm of Cooley and Tukey. The SSA algorithm contains much more details for obtaining the performance stated by the authors, in particular in splitting the original numbers into K smaller numbers of size M it is crucial to optimally select such value. Moreover it is important to switch to Karatsuba or Toom-Cook algorithm in the recursive calls when the numbers are small enough, and to use other tricks implemented for example into the GMP open source library.

Schönhage and Strassen conjectured a lower bound complexity of O(N logN) for the multiplication of two N digits numbers, and the first result that improves the SSA algorithm was found by Fürer in 2007 which still did not reach the conjectured complexity. Fürer showed that the asymptotic complexity can be decreased to:

Where the log-star function at the exponent is called iterated logarithm and it represents the number of times the logarithm must be applied to the given number in order to get a value equal or less than one. Such log-star function is an increasing function, but much slower than loglogN of the SSA algorithm. In subsequent years other reseachers make explicit the constant factor of the exponent until reaching the best result in 2018 as:

For any practical usage the SSA algorithm is still the right choice as carefully benchmarked in [8].

A few years after Fürer’s work, the two mathematicians David Harvey and Joris van der Hoeven proved in 2019 that the asymptotic complexity can actually be lowered to

Is this the end of the story? It seems hard to beat that result, but today there is no real proof that this is the final answer!

I would like to thank Quanta Magazine for introducing me to such a fantastic story that I have tried to dig deeper for my better understanding.

References

[1] — A. Karatsuba and Yu. Ofman (1962). “Multiplication of Many-Digital Numbers by Automatic Computers”. Proceedings of the USSR Academy of Sciences. 145: 293–294. Translation in the academic journal Physics-Doklady, 7 (1963), pp. 595–596

[2] — A. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. In Soviet Mathematics-Doklady, volume 7, pages 714–716, 1963

[3] — S. A. Cook. On the Minimum Computation Time of Functions. PhD thesis, Harvard University, 1966. pp. 51–77

[4] — Marco Bodrato, “Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0”; “WAIFI’07 proceedings” (C.Carlet and B.Sunar, eds.) LNCS#4547, Springer, Madrid, Spain, June 2007, pp. 116–133

[5] — A. Schonhage and V. Strassen, “Schnelle Multiplikation großer Zahlen”, Computing 7 (1971)

[6] — Crandall, R., and Fagin, B. “Discrete weighted transforms and large-integer arithmetic”. Mathematics of Computation 62, 205 (1994), 305–324

[7] — M. Fürer (2007). “Faster Integer MultiplicationProceedings of the 39th annual ACM Symposium on Theory of Computing(STOC), 55–67, San Diego, CA, June 11–13, 2007, and SIAM Journal on Computing, Vol. 39 Issue 3, 979–1005, 2009.

[8] — Lüders, Christoph (2015). Implementation of the DKSS Algorithm for Multiplication of Large Numbers. International Symposium on Symbolic and Algebraic Computation. pp. 267–274

[9] — Harvey, David; Van Der Hoeven, Joris (2019–04–12). Integer multiplication in time O(n log n)