At the end of Chapter 2, we hit a wall.

Our bitmask Dynamic Programming solution is incredibly fast for a standard room, but what if we need to tile a hallway where $N = 8$, but $M = 1,000,000,000$?

Even with a highly optimized DP loop, your CPU has to iterate a billion times. If $M = 10^{18}$, the sun will swallow the Earth before your loop finishes. We need a way to fast-forward time. We need to jump straight to the end of the hallway without actually walking down it.

To do that, we have to look closely at our DP rules.

The Static Rules of Tiling

When we mapped out the bitmask for the jagged edge of the board, we defined a set of transition rules. If you have an edge profile of 01001011, there is a strictly defined, finite set of new profiles it can turn into when you add the next column of dominoes.

Crucially, these rules never change. The transition from column 3 to column 4 follows the exact same geometric logic as the transition from column 999,999 to 1,000,000. It is a perfectly static system.

In mathematics, when you have a static system of linear transitions, you don’t write a for loop. You write a Matrix.

Building the Transfer Matrix

Let’s scale down to a very small board to see how this works: $N = 3$.

If we ignore impossible states and utilize the natural symmetry of the board (an edge missing the top square is geometrically identical to an edge missing the bottom square), the state space for $N = 3$ compresses down to just 4 distinct edge profiles.

Because there are only 4 states, we can map every possible transition between them into a $4 \times 4$ grid. We call this the Transfer Matrix ($T$).

If we represent our current column’s edge profiles as a 1D vector $V_i$, finding the profiles for the next column is no longer an algorithmic loop. It is a single operation of linear algebra:

$$V_{i+1} = T \cdot V_i$$

If we start at column 0 with vector $V_0$, we can find column 1 by multiplying by $T$. To find column 2, we multiply by $T$ again: $V_2 = T \cdot T \cdot V_0$, or $V_2 = T^2 \cdot V_0$.

By extension, to find the number of ways to tile a board of length $M$, the formula is breathtakingly simple:

$$V_M = T^M \cdot V_0$$

The Magic of Fast Exponentiation

At first glance, this looks like we haven’t solved anything. Multiplying a matrix by itself a billion times is just as slow as running a for loop a billion times.

But matrix multiplication is associative. This means we don’t have to multiply them one by one. We can use Binary Matrix Exponentiation.

If we want to calculate $T^8$, we don’t do $T \cdot T \cdot T \cdot T \cdot T \cdot T \cdot T \cdot T$. Instead, we calculate $T^2$. Then we multiply $T^2 \cdot T^2$ to get $T^4$. Then we multiply $T^4 \cdot T^4$ to get $T^8$.

We reached the 8th power in just 3 operations.

If we want to reach $M = 1,000,000,000$, we only need about 30 matrix multiplications. We have taken an algorithm that runs in $O(M)$ time and crushed it down to $O(\log M)$ time. A calculation that would have taken seconds now takes microseconds.

Folding the Matrix (Symmetry)

Before we start aggressively squaring matrices, we have to respect the cardinal rule of linear algebra: matrix multiplication is computationally expensive.

The time it takes to multiply two matrices scales cubically with the size of the matrix, or $O(S^3)$, where $S$ is the number of states. If we double the size of the matrix, the math takes eight times as long. We want our transfer matrices to be as small as absolutely possible.

Luckily, the geometry of a rectangular grid gives us a massive discount: Symmetry.

Look at a $3 \times M$ board. An edge profile that is missing exactly the top square behaves identically to an edge profile that is missing exactly the bottom square. The mathematical transitions are a perfect mirror image.

Because geometric reflections yield identical transition rules, we can fold symmetric profiles together to shrink the Transfer Matrix.

Because geometric reflections yield identical transition rules, we can fold symmetric profiles together to shrink the Transfer Matrix.

Instead of tracking both of those states individually, we can fold them together into a single state representing “missing one outer corner.”

By eliminating mathematically unreachable states and folding symmetric states together, the raw $2^N$ bitmask compresses beautifully.

  • For $N = 3$, the $8 \times 8$ matrix collapses into a tiny $4 \times 4$ matrix.
  • For $N = 4$, the $16 \times 16$ matrix compresses down to an elegant $5 \times 5$ matrix.

A $5 \times 5$ matrix has 25 elements. Multiplying it cubically takes about 125 operations. Your computer can do that in its sleep. For small values of $N$, the Transfer Matrix is an unstoppable force. It solves the billion-column hallway instantly.

The Dimensional Wall ($N = 16$)

So, the Transfer Matrix works perfectly. But remember the theme of this problem: every time you think you have found the ultimate shortcut, the grid gets larger.

What happens if we want to tile a hallway that is a billion columns long, but $N = 16$?

First, we define the bitmask. For $N = 16$, our rightmost edge has $2^{16}$ possible profiles, which is $65,536$ states. We run our symmetry optimization. We fold the mirror images and prune the dead ends. The state space compresses down, but the math is unforgiving. We are left with exactly 6,563 valid states.

Our transfer matrix $T$ is now a $6563 \times 6563$ dense grid of coefficients. It contains over 43 million elements.

To reach the billionth column, we just need to use our binary exponentiation trick: $T^{1,000,000,000}$. That is only about 30 matrix multiplications!

But remember the cubical scaling rule: $O(S^3)$.

To multiply two $6563 \times 6563$ matrices together just once, your CPU has to perform $6563 \times 6563 \times 6563$ operations. That is roughly 282 billion operations.

To do binary exponentiation, you have to do that 30 times. That is over 8.4 trillion operations.

Your laptop’s fans will spin up to maximum velocity, the chassis will get hot enough to fry an egg, and you will be sitting there for hours waiting for a simple tiling calculation.

We are completely trapped. We know the matrix holds the answer. The linear algebra is flawless. But the matrix is simply too massive to multiply.

To break through the $N = 16$ wall, we have to pull off the ultimate mathematical heist. We need the answer that $T^M$ provides, but we have to find a way to get it without ever actually multiplying a single matrix.