It started in 1994 in Russia.
I was 14 years old, hacking away on an Intel 80286 processor. Some friends of mine had just come back from the All-Russian Olympiad in Informatics and showed me one of the problems from the competition. It was called the “Parquet” problem, and it asked a beautifully simple question: How many ways can you tile an $8 \times 20$ grid with $2 \times 1$ dominoes?
I booted up Turbo Pascal and wrote a naive backtracking algorithm. It choked.
I wasn’t tormented by it, but it became one of those classic puzzles I kept in my back pocket. Every few years, as I learned new programming concepts, I would go back to the dominoes and see how much further I could push the grid.
What started as a simple lesson in bitmasking and Dynamic Programming eventually dragged me through statistical physics, complex eigenvalues, cyclotomic polynomials, the Chinese Remainder Theorem, and Lucas sequences. We didn’t just solve the $8 \times 20$ board. We built an engine capable of calculating a massive $10000 \times 10000$ grid in a reasonable amount of time (136 seconds on my MacBook Pro).
The Code Arsenal
If you want to run these engines yourself, I have open-sourced all five major evolutionary steps of the code. You can find the entire collection in my Domino Tilings GitHub Repository.
Here is the breakdown of what we built, and the limits of what each approach can handle:
| Generation | File | Algorithm / Approach | The Limit |
|---|---|---|---|
| Gen 1 | domino-dp.cpp |
Dynamic Programming with Bitmask Memoization | $\approx 20 \times 20$ |
| Gen 2 | domino-bm.c |
Bitmask DP + Berlekamp-Massey Algorithm | $\approx 20 \times 1000$ |
| Gen 3 | domino-poly.c |
2D Kasteleyn with Cyclotomic Polynomials | $\approx 512 \times 512$ |
| Gen 4 | domino-2d-ff.c |
2D Kasteleyn inside Finite Fields + CRT + OpenMP | $\approx 4096 \times 4096$ |
| Gen 5 | domino-1d-ff.c |
1D Lucas Sequences + CRT + OpenMP (The Ultimate Engine) | $\approx 10000 \times 10000$ |
(Note: Generations 3, 4, and 5 require the GNU Multiple Precision (GMP) library to handle the massive integers, and Gens 4 and 5 require OpenMP for multi-core CPU threading).
The Ultimate Lesson: Math vs. Hardware
Looking back at this progression, there is a profound truth about computer science hidden in the timeline.
In 1994, my Intel 80286 was likely running at 12 or 16 MHz. Today, my Apple MacBook Pro has billions of transistors, runs at roughly 3.5 GHz per core, and possesses an exponentially wider memory bus.
But here is the catch: If you run my Generation 1 Dynamic Programming algorithm on my modern MacBook Pro, it still cannot solve a $64 \times 64$ board. The time complexity of the DP approach involves $2^{64}$ states per column. It doesn’t matter if your CPU is from 1994, 2026, or 2050. It doesn’t matter if you harness the combined computing power of every server farm on Earth. An exponential time complexity will eat your hardware alive.
Hardware upgrades scale linearly, or maybe quadratically if you are lucky with parallelism. But algorithmic bottlenecks scale exponentially.
To bridge the gap between $8 \times 20$ and $10000 \times 10000$, we couldn’t rely on hardware manufacturers to build faster chips. We had to abandon standard computer science and ask a physicist for a matrix. We had to ask abstract algebra for a cyclotomic polynomial. We had to ask number theory for a remainder theorem, and we had to ask a Russian textbook for a Lucas sequence.
You cannot out-compute bad math. But with the right math, you can force a computer to bend the universe to its will.
Closing the IDE
It is a fun milestone to finally crush this problem. The “Parquet” grid is definitively solved, and my old Turbo Pascal compiler can rest easy.
Thank you for following along on this incredibly deep, incredibly nerdy rabbit hole. If you clone the repository and manage to calculate a $100,000 \times 100,000$ board before your RAM catches fire, let me know.
Until then, I’m going to go look at a blank chessboard and try not to see bipartite graphs.