The $2 \times M$ board tricked a lot of 14-year-olds in 1994. By perfectly mirroring the Fibonacci sequence, it gave the illusion that every grid height $N$ would generate a simple, linear recurrence relation. All you had to do was find the pattern, write a quick for loop, and claim your Olympiad medal.

And then they tried to calculate a $3 \times M$ board.

The Odd Reminder

Before we start placing tiles, we have to remember the golden rule from Chapter 1: if the total area is odd, the board is impossible to tile.

Because $N = 3$ is odd, $M$ must be even. A $3 \times 3$ or $3 \times 5$ board will always leave an empty square. Therefore, we only care about even column widths: $3 \times 2$, $3 \times 4$, $3 \times 6$, and so on.

The $3 \times 2$ Base Case

Let’s look at the smallest valid grid, the $3 \times 2$ board. The area is 6. If we sketch out the permutations, we find exactly 3 ways to tile it:

  1. Three horizontal dominoes stacked on top of each other.
  2. One horizontal domino at the top, and two vertical dominoes at the bottom.
  3. Two vertical dominoes at the top, and one horizontal domino at the bottom.
The 3 distinct ways to tile a 3x2 board.

The 3 distinct ways to tile a 3x2 board.

So, $f(2) = 3$. Easy enough.

The Interlocking Trap ($3 \times 4$)

Here is where the intuition breaks down.

If we want to solve a $3 \times 4$ board, the naive approach is to just glue two $3 \times 2$ boards together. Since there are 3 ways to tile the left half, and 3 ways to tile the right half, the answer should be $3 \times 3 = 9$, right?

Wrong. The actual answer is 11.

Where do those two extra layouts come from? They come from the jagged edge.

When we assume the board cleanly splits down the middle into two $3 \times 2$ blocks, we are ignoring the fact that dominoes can cross the boundary. You can lay horizontal dominoes right across the fault line, interlocking the left and right sides together like brickwork.

The Interlocking Trap: Dominoes crossing the boundary create layouts not found by simply multiplying smaller blocks together.

The Interlocking Trap: Dominoes crossing the boundary create layouts not found by simply multiplying smaller blocks together.

To calculate the exact number of ways to tile a $3 \times M$ board, we cannot just look at perfect rectangles anymore. We have to look at the shapes created when a domino crosses the boundary.

The “Aha!” Moment: Coupled Recurrence

To solve this algorithmically, we have to define two completely different states for our board:

  • State $f(M)$: A perfect, mathematically complete $3 \times M$ rectangle.
  • State $g(M)$: A $3 \times M$ rectangle that is missing exactly one corner square.
To solve the 3xM board, we must track both the perfect block f(M) and the jagged block g(M).

To solve the 3xM board, we must track both the perfect block f(M) and the jagged block g(M).

This is the cognitive leap the Olympiad judges were looking for. The perfect board $f(M)$ and the jagged board $g(M)$ are mathematically linked. They rely on each other to grow.

To build a perfect $f(M)$ board, you have two choices for the rightmost edge:

  1. Cap it with a perfect $3 \times 2$ block. Since we know a $3 \times 2$ block has 3 variations, this gives us $3 \times f(M-2)$.
  2. Cap it by filling in the missing corners of two jagged $g(M-1)$ boards (one missing the top corner, one missing the bottom corner).

This creates a beautiful, coupled set of equations:

$$f(M) = f(M-2) + 2g(M-1)$$

$$g(M) = f(M-1) + g(M-2)$$

By tracking both the perfect edge and the jagged edge simultaneously, we can calculate any $3 \times M$ board with perfect accuracy. We have officially graduated from simple sequences to Dynamic Programming.

But in 1994, the problem didn’t ask for $N = 3$. It asked for $N = 8$.

Scaling to 8x20: The Olympiad Solution

For a board of height 3, we only had to track two distinct right-edge profiles: a perfect straight edge, or an edge missing exactly one square.

But the judges in 1994 asked for $N = 8$.

When the height is 8, the jagged edge can take on vastly more shapes. Any of the 8 squares on the rightmost boundary could be filled by a horizontal domino poking out, or left empty.

To a 14-year-old in the 90s writing Pascal, a sequence of 8 “filled or empty” states looks exactly like one thing: an 8-bit byte.

We can represent the entire jagged edge of the board as a bitmask. A 0 means the square is empty, and a 1 means a horizontal domino is sticking out into it.

  • 00000000 (Decimal 0) is a perfectly flat edge.
  • 11111111 (Decimal 255) is an edge where every single row has a domino poking out.

Because an 8-bit integer maxes out at 255, there are exactly $2^8 = 256$ possible edge profiles.

This is how the Olympiad was won. You don’t write a recursive function that blindly places dominoes and hopes for the best. You create an array of 256 integers. Each index in the array represents one of those edge profiles, and the value stores “how many ways we have found to create this specific edge.”

You start at column 1, then step forward to column 2, then column 3, iterating all the way to $M = 20$. At each column, you look at your 256 profiles and calculate how placing vertical and horizontal dominoes transitions them into new profiles for the next column.

By the time you reach the 20th column, you just print the value sitting at index 0 (the perfectly flat edge, 00000000).

It is a brutally efficient, elegant piece of Dynamic Programming. A 16-MHz 286 processor can crunch through an $8 \times 20$ DP matrix in a fraction of a millisecond.

Pushing the Limits

For the 1994 Olympiad, $8 \times 20$ was the ceiling. But on modern hardware, how far can we stretch this bitmask DP?

If we increase the height to $N = 20$, our edge profile becomes a 20-bit integer. That means $2^{20}$, or just over 1 million states. A modern CPU will chew through a million states per column without breaking a sweat.

If we push it to $N = 24$, we are tracking $2^{24}$ (about 16.7 million) states. It gets memory-hungry and a bit sluggish, but if you are patient, a standard laptop can still brute-force it.

But what if we flip the dimensions?

What if $N$ remains relatively small—say, $N = 8$—but the room is a phenomenally long hallway where $M = 1,000,000,000$?

Our bitmask is tiny (only 256 states), but our for loop now has to iterate a billion times. If $M = 10^{18}$, that loop will take eons. We cannot iterate column by column anymore.

We need a way to skip the loop entirely. And to do that, we have to recognize that the rules governing how one jagged edge transitions to the next never change. It is a perfectly static set of rules.

Because the rules are static, we can map all 256 transitions into a Transfer Matrix. And once we have a matrix, we unlock the ultimate cheat code of linear algebra.


The Code: Generation 1

If you want to see the pure computer science approach in action, I have open-sourced the C++ code for this bitmask Dynamic Programming engine. It will happily calculate the exact number of tilings for any board up to about $20 \times 20$ before the state transitions overwhelm the CPU.

💻 View the code on GitHub: domino-dp.cpp