At the end of Chapter 8, we hit a physical wall. We successfully banished floating-point inaccuracies by transmuting Kasteleyn’s formula into an algebraic polynomial.
But for a massive $4096 \times 4096$ board, the coefficients of that polynomial grow into integers with millions of digits. Holding them in memory requires gigabytes of RAM, and multiplying them grinds the CPU to a halt.
We need the absolute precision of algebraic mathematics, but we need it to fit inside the tiny, lightning-fast 64-bit registers of our processor.
To pull this off, we have to stop doing math in the normal universe and move our calculations into a Finite Field.
The Magic of Modular Arithmetic
Imagine doing all of our Kasteleyn multiplications, but every time a number threatens to get larger than a specific limit, we wrap it back around to zero.
This is modular arithmetic (often called clock math). If we choose a prime number $p$ as our limit, we create a mathematical sandbox called a Finite Field, written as $\mathbb{F}_p$.
Inside $\mathbb{F}_p$, the rules of algebra still work perfectly. You can add, subtract, multiply, and perfectly divide. But the beautiful part? No number ever exceeds $p$. If we choose a prime number that just barely fits inside a standard 64-bit integer—like $p = 4611686018427387847$—then every single piece of math we do will natively fit inside the CPU cache. No dynamic memory allocation. No GNU Multiple Precision (GMP) overhead in the inner loop. Just raw, blistering hardware speed.
Integer Trigonometry (Roots in $\mathbb{F}_p$)
But wait. Kasteleyn’s formula requires calculating roots of unity—points perfectly spaced around a circle, usually represented by complex numbers with imaginary $i$.
How do you calculate a complex root of unity inside a sandbox of whole integers?
This is where number theory hands us a cheat code. In a finite field, if you pick the right prime number, you don’t need complex numbers at all. The roots of unity natively exist as simple, whole integers!
Here is the rule: if we need to find a $d$-th root of unity (where $d$ is related to our board dimensions), we just need to pick a prime $p$ such that:
$$p \equiv 1 \pmod d$$Because of Dirichlet’s theorem on arithmetic progressions, we know there are infinitely many primes that satisfy this exact condition.
If we find one of these primes, we can calculate a specific integer $r$ inside our field where $r^d \equiv 1 \pmod p$. This integer $r$ behaves exactly like $e^{2\pi i / d}$. We literally swapped a complex, irrational fraction for a standard 64-bit integer.
Hardware Sympathy & Chebyshev
Now we rewrite our algorithm. We aren’t building massive polynomials anymore. We are just multiplying 64-bit integers modulo $p$.
But we can optimize it even further. We know the inner terms of Kasteleyn’s formula behave like cosines. In mathematics, the sequence of cosines can be generated using Chebyshev polynomials, which follow a very simple linear recurrence relation:
$$T_n(x) = 2x T_{n-1}(x) - T_{n-2}(x)$$Because we are inside our finite field, $x$ is just a 64-bit integer. This means the innermost loop of our program—the core engine that calculates the Kasteleyn product for a $4096 \times 4096$ board—compresses down to a single integer multiplication, a single subtraction, and a single modulo operation.
It is the holy grail of computer science: Hardware Sympathy. The algorithm perfectly matches the physical architecture of the CPU.
The Cliffhanger
Let’s compile the C code and run it for the $4096 \times 4096$ board.
In Chapter 8, the polynomial method crashed the computer. Now, with our finite field integer loop, the calculation finishes in a few milliseconds. The speed is absolutely breathtaking.
The program spits out an answer:
14892374982374
Wait.
The number of ways to tile a $4096 \times 4096$ board is an integer with nearly 5 million decimal digits. But our program just printed a tiny 14-digit number.
Why? Because we did all of our math modulo $p$.
We didn’t calculate the actual, massive number of tilings. We only calculated what the remainder would be if you took that massive number and divided it by our prime $p$.
We just built the fastest Kasteleyn engine on earth, but it only gives us about 60 bits of the 15-million-bit answer. We are staring through a tiny keyhole at a massive universe.
To get the full 5-million-digit number, we are going to need more keyholes. We are going to have to stitch the multiverse together.