At the end of Chapter 3, the math had backed us into a corner.

For a board of height $N = 16$, our compressed state space contained exactly 6,563 valid edge profiles. This meant our Transfer Matrix ($T$) was a $6563 \times 6563$ dense grid. To find the number of ways to tile a billion-column hallway, we needed to calculate $T^{1,000,000,000}$.

Because matrix multiplication scales cubically—$O(S^3)$—squaring that matrix requires hundreds of billions of operations. Doing it 30 times for binary exponentiation requires over 8.4 trillion operations.

We know the matrix holds the answer, but the matrix is simply too massive to compute.

To break through this wall, we have to pull off the ultimate algorithmic heist. We are going to abandon matrix multiplication entirely.

The Invisible Guarantee

To execute this bypass, we rely on an invisible mathematical permission slip called the Cayley-Hamilton Theorem.

In linear algebra, this theorem states that every square matrix satisfies its own characteristic polynomial. But for our purposes, it guarantees a much more powerful corollary: because the matrix satisfies a polynomial, the sequence of answers it generates must also satisfy a linear recurrence relation.

If our state space has $S$ states, the Cayley-Hamilton theorem guarantees that the number of ways to tile a board of length $M$ is just a linear combination of the answers for the previous $S$ columns.

We know that an equation exists that looks like this:

$$Answer_M = c_1(Answer_{M-1}) + c_2(Answer_{M-2}) + \dots + c_S(Answer_{M-S})$$

We don’t need the matrix. If we can just find those $c$ coefficients, we can skip the linear algebra entirely.

The Berlekamp-Massey Black Box

So how do we find the coefficients of a 6,563-degree polynomial without analyzing the $6563 \times 6563$ matrix?

We work backwards. Instead of trying to derive the equation top-down, we look at the answers and build the equation bottom-up.

This is where we introduce the Berlekamp-Massey (BM) algorithm. You can think of BM as a mathematical black box. You feed it a sequence of numbers, and it reverse-engineers the shortest possible linear recurrence that generates them.

Here is the heist step-by-step:

Step 1: Run the standard DP. We resurrect the simple, fast bitmask Dynamic Programming loop from Chapter 2. We don’t build a 2D matrix; we just take our flat 1D array of 6,563 states and step it forward column by column. Because this is just vector multiplication, it takes less than a millisecond per column.

Step 2: Collect the sequence. We run this fast loop for exactly $2 \times S$ columns. For $N=16$, that is about 13,126 columns. Every time we finish a column, we record the answer (the value at index 0 for a perfectly flat edge) and save it to an array.

Step 3: Shake out the recurrence. We feed our array of 13,126 numbers into the Berlekamp-Massey algorithm. In a fraction of a second, the algorithm analyzes the sequence and spits out the exact $c$ coefficients that define our recurrence.

The Berlekamp-Massey algorithm bypasses the Transfer Matrix entirely, generating the characteristic polynomial directly from the sequence of answers.

The Berlekamp-Massey algorithm bypasses the Transfer Matrix entirely, generating the characteristic polynomial directly from the sequence of answers.

Polynomial Exponentiation (Fiduccia’s Algorithm)

Once BM hands us the coefficients $[c_1, c_2, \dots, c_d]$, we convert that recurrence into a characteristic polynomial $P(x)$:

$$P(x) = x^d - c_1 x^{d-1} - c_2 x^{d-2} - \dots - c_d$$

Now, we want the answer for $M = 1,000,000,000$.

Instead of multiplying matrices, we calculate $x^M$ modulo our polynomial $P(x)$. We use the exact same binary exponentiation trick (square and multiply) that we used for matrices, but we apply it to polynomials:

$$R(x) = x^M \pmod{P(x)}$$

Because we are taking the modulo of $P(x)$ (which has degree $d$), our remainder $R(x)$ is guaranteed to be a polynomial of degree strictly less than $d$. It will look something like this:

$$R(x) = r_0 + r_1 x + r_2 x^2 + \dots + r_{d-1} x^{d-1}$$

These $r$ coefficients are the exact weights we need. To find the final answer for the billionth column, we just take the dot product of these weights and our original base cases (the first $d$ terms of the sequence we generated in Step 1):

$$Total = r_0(S_0) + r_1(S_1) + r_2(S_2) + \dots + r_{d-1}(S_{d-1})$$

The Anatomy of the Speedup

Why did we go through all this polynomial gymnastics? Look at the Big-O notation.

Multiplying matrices takes $O(S^3)$. For $N=16$, squaring the matrix took 282 billion operations. Multiplying polynomials takes $O(S^2)$. For $N=16$, squaring the polynomial takes 43 million operations.

By abandoning the matrix, we sped up the math by a factor of 6,500. We can now solve $N=16$ for $M=10^{18}$ in the blink of an eye. The heist is a complete success.

The Final Wall ($N = 64$)

We have stretched computer science to its absolute breaking point. Bitmask DP, Transfer Matrices, and the Berlekamp-Massey algorithm allow us to solve astonishingly long boards.

But what if the board is perfectly square, and simply very large? What if we want to tile a $64 \times 64$ checkerboard?

Let’s look at the bitmask. For $N = 64$, the rightmost edge has $2^{64}$ possible profiles. That is 18.4 quintillion states.

We can’t use a transfer matrix; it would be larger than the hard drives of every computer on Earth combined. We can’t use the Berlekamp-Massey heist either. To use BM, we have to run the DP loop for $2S$ columns. But with 18.4 quintillion states, you cannot even allocate the memory for the 1D array, let me alone iterate it once.

Computer science has officially failed us.

To solve the $64 \times 64$ board, we have to leave computer science behind entirely. We must step into the realm of abstract algebra, graph theory, and statistical mechanics.


The Code: Generation 2

To see how we completely bypassed the deep DP matrix using linear recurrences, you can check out the Berlekamp-Massey implementation in C. It uses GMP to handle the massive integers and can calculate long “hallway” grids up to about $20 \times 1000$.

💻 View the code on GitHub: domino-bm.c