Chapter 8: Banishing Floats

At the end of Chapter 7, we were holding a mathematically perfect equation that our computers were physically incapable of executing.

Because Kasteleyn’s formula relies on $\pi$ and $\cos$, a standard programming language evaluates it using 64-bit floating-point arithmetic. For an 8x8 board, the float drift is a minor annoyance. For a 16x16 board, the exact 36-digit answer is completely corrupted by garbage decimals.

If we want to calculate the exact, pristine integer for a 64x64 board—a number with over 500 digits—we cannot use decimals. We cannot use approximations.

We must find a way to calculate a massive trigonometric product without ever actually calculating a cosine.

The Escape Hatch: Euler’s Formula

How do you calculate trigonometry without floats? You go back to where trigonometry was born: the circle.

If you have ever watched a nerdy math video on YouTube, you have probably seen the “most beautiful equation in mathematics”: $e^{i\pi} = -1$. This famous identity is actually just a specific case of Euler’s formula, which bridges the gap between exponential growth and circular geometry.

Euler’s formula mapped on the complex plane

Here is what the full Euler’s formula looks like for any angle $\theta$:

$$e^{i\theta} = \cos \theta + i \sin \theta$$

If we plug in a negative angle ($-\theta$), the cosine stays the same because it is symmetrical, but the sine wave flips:

$$e^{-i\theta} = \cos \theta - i \sin \theta$$

Now, let’s do some basic middle-school addition. If we add those two equations together, the positive and negative imaginary sine terms perfectly cancel each other out, leaving us with:

$$2 \cos \theta = e^{i\theta} + e^{-i\theta}$$

This is our escape hatch. We just figured out how to express a cosine wave using purely exponents.

Enter the Roots of Unity

Let’s look at the specific angles Kasteleyn asks us to evaluate: $\frac{\pi j}{M+1}$ and $\frac{\pi k}{N+1}$. These aren’t random angles; they are perfect, rational fractions of a circle.

To see how we can exploit this geometry, let’s look at Kasteleyn’s formula again:

$$\prod_{j=1}^{\lceil M/2 \rceil} \prod_{k=1}^{\lceil N/2 \rceil} \left( 4 \cos^2 \frac{\pi j}{M+1} + 4 \cos^2 \frac{\pi k}{N+1} \right)$$

Each term looks like this

$$4 \cos^2 \frac{\pi j}{M+1} + 4 \cos^2 \frac{\pi k}{N+1}$$

Given that $4 \cos^2 \theta = (2 \cos \theta)^2 = (e^{i\theta} + e^{-i\theta})^2 = 2 + e^{2 i \theta} + e^{-2 i \theta}$ let’s convert each term into exponential form

$$4 + e^{\frac{2 \pi i j}{M+1}} + e^{-\frac{2 \pi i j}{M+1}} + e^{\frac{2 \pi i k}{N+1}} + e^{-\frac{2 \pi i k}{N+1}}$$

Let $z = e^{\frac{2 \pi i}{(M+1)(N+1)}}$. In other words z is the $(M+1)(N+1)$-th root of unity.

Now each term becomes a polynomial in z

$$4 + z^{j (N+1)} + z^{-j (N+1)} + z^{k (M+1)} + z^{-k (M+1)}$$

By converting the continuous cosines into algebra, we stop multiplying floating-point numbers. We start multiplying polynomials with integer coefficients.

A Concrete Example: The 4x6 Board

To see exactly how the computer does this without decimals, let’s trace Kasteleyn’s double-product formula for an oblong 4x6 board ($M=4, N=6$).

We define our unified variable $z$ as:

$$z = e^{\frac{2 \pi i}{(4+1)(6+1)}} = e^{\frac{2 \pi i}{35}}$$

Because our base slice is $2 \pi / 35$, it takes exactly 35 steps to complete a full $2\pi$ circle. This gives us our golden wrap-around rule: $z^{35} = 1$.

The Kasteleyn double product requires us to evaluate $j$ from 1 to 2, and $k$ from 1 to 3. That means there are 6 distinct terms we need to evaluate.

To make this computer-friendly, we don’t want negative exponents. Because we know $z^{35} = 1$, we can mathematically rewrite a negative exponent like $z^{-7}$ as $z^{28}$.

Let’s write out the 6 polynomials we need to multiply. For each term, we will write out raw polynomial first, and then sort our exponents from lowest (free term) to highest:

$j=1, k=1$: $4 + z^{7} + z^{28} + z^{5} + z^{30} = 4 + z^{5} + z^{7} + z^{28} + z^{30}$
$j=1, k=2$: $4 + z^{7} + z^{28} + z^{10} + z^{25} = 4 + z^{7} + z^{10} + z^{25} + z^{28}$
$j=1, k=3$: $4 + z^{7} + z^{28} + z^{15} + z^{20} = 4 + z^{7} + z^{15} + z^{20} + z^{28}$
$j=2, k=1$: $4 + z^{14} + z^{21} + z^{5} + z^{30} = 4 + z^{5} + z^{14} + z^{21} + z^{30}$
$j=2, k=2$: $4 + z^{14} + z^{21} + z^{10} + z^{25} = 4 + z^{10} + z^{14} + z^{21} + z^{25}$
$j=2, k=3$: $4 + z^{14} + z^{21} + z^{15} + z^{20} = 4 + z^{14} + z^{15} + z^{20} + z^{21}$

The Massive Multiplication

If we just tell the computer to multiply all 6 of these polynomials together, we get 162-th degree polynomial. We are not going to write it out here.

But our computer knows the golden rule: $z^{35} = 1$. Because $z$ is a root of unity, any exponent larger than 35 just wraps back around the circle. We can fold those massive exponents down using simple modulo 35 arithmetic!

$z^{58}$ becomes $z^{35} \cdot z^{23} = 1 \cdot z^{23} =$ $z^{23}$
$z^{53}$ becomes $z^{35} \cdot z^{18} = 1 \cdot z^{18} =$ $z^{18}$

Because we automatically modulo the exponents by 35 at every single multiplication step, the polynomial array in our computer’s memory will never exceed 35 elements.

When the multiplication loop finishes, we are left with a beautifully bounded polynomial. For our particular example it looks like this:

$$ \begin{aligned} P(z) ={} & 15366 + 5985z + 5985z^2 + 5985z^3 + 5985z^4 + 9429z^5 \cr & + 5985z^6 + 11641z^7 + 5985z^8 + 5985z^9 + 9429z^{10} \cr & + 5985z^{11} + 5985z^{12} + 5985z^{13} + 11641z^{14} + 9429z^{15} \cr & + 5985z^{16} + 5985z^{17} + 5985z^{18} + 5985z^{19} + 9429z^{20} \cr & + 11641z^{21} + 5985z^{22} + 5985z^{23} + 5985z^{24} + 9429z^{25} \cr & + 5985z^{26} + 5985z^{27} + 11641z^{28} + 5985z^{29} + 9429z^{30} \cr & + 5985z^{31} + 5985z^{32} + 5985z^{33} + 5985z^{34} \end{aligned} $$

The Floating-Point Trap, Part 2

We did it. We successfully calculated Kasteleyn’s double product using nothing but exact integer arrays.

To get the exact number of domino tilings for our 4x6 board, all we have to do is take that final polynomial $P(z)$ and plug in our variable $z$.

But wait. What was $z$ again?

$$z = e^{\frac{2 i \pi}{35}}$$

To plug $z$ into our polynomial, we have to calculate $e^{2 \pi i/35}$. Which means we have to calculate $\cos(2 \pi/35) + i \sin(2 \pi/35)$.

Which means we are right back to calculating floating-point decimals. And now they are complex floating-point decimals.

We just did all that brilliant, massive integer algebra, and we are still trapped by the limitations of floating-point trigonometry. It is a complete mathematical catastrophe.

Enter the Cyclotomic Polynomial

We need a way to extract the true integer from our polynomial $P(z)$ without ever evaluating $z$.

To do this, we use a cheat code from abstract algebra involving Cyclotomic Polynomials ($\Phi_d(z)$).

The equation $z^{35} - 1 = 0$ contains the roots of unity we need, but it also contains junk roots we don’t. A cyclotomic polynomial is the smallest possible, unbreakable algebraic equation that contains only the specific primitive roots we care about. And miraculously, its coefficients are always pure, exact integers.

Specifically for $d = 35$

$$ \begin{aligned} \Phi_{35}(z) ={} & 1 - z + z^5 - z^6 + z^7 - z^8 + z^{10} - z^{11} + z^{12} \cr & - z^{13} + z^{14} - z^{16} + z^{17} - z^{18} + z^{19} - z^{23} + z^{24} \end{aligned} $$

Because we know the true mathematical answer to Kasteleyn’s formula is a real integer, polynomial algebra dictates a beautiful relationship. If we take our messy degree 34 polynomial $P(z)$ and perform polynomial long division by the degree 24 cyclotomic polynomial $\Phi_{35}(z)$, all the $z$ variables are mathematically forced to annihilate each other.

The remainder of dividing that 34th-degree monster by the 24th-degree cyclotomic polynomial perfectly collapses down into a degree-zero constant integer: 281. Exactly 281 ways to tile a 4x6 board. No cosines, no floats, just pure integer algebra.

The Cost of Exactness

Let’s look at what we’ve built.

We swapped floating-point math for pure integer polynomials. To handle coefficients that grow into thousands of digits, we use the GNU Multiple Precision (GMP) library in C. GMP dynamically allocates memory to handle integers of practically infinite size.

We loop through our grid, multiply the polynomials using GMP, wrap the exponents, divide by the cyclotomic polynomial, and extract the remainder. It works flawlessly. For the 64x64 board, the algorithm spits out the exact 500-digit integer.

But what if we try an oblong, incredibly massive board, like 4096x4097?

A board this size has nearly 16.8 million squares. But look at what happens to our math. We need the LCM of 4097 and 4098. That LCM is over 16.7 million. To calculate this using our algebraic method, our program has to allocate memory for an array containing nearly 17 million coefficients, where each coefficient is a GMP big-integer taking up kilobytes of space.

When you try to run this, your 16GB of RAM instantly fills up. The CPU starts swapping memory to the hard drive, grinding the system to a halt. We fixed the precision trap, but we created a catastrophic memory bottleneck.

To solve massive boards, we have to look at our giant polynomials and ask a radical question: What if we could do all this exact math, but cap the numbers so they never grow larger than a standard 64-bit integer?

To do that, we have to leave continuous algebra behind and enter the ultimate mathematical cheat code: Finite Fields.

The Code: Generation 3

If you want to watch a computer evaluate a massive 2D Kasteleyn trigonometric product using absolutely nothing but pure integer algebra and Cyclotomic Polynomial long division, here is the code. It works flawlessly for perfectly square grids up to about $512 \times 512$ before the GMP array memory footprint brings your RAM to its knees.

💻 View the code on GitHub: domino-poly.c

The Escape Hatch: Euler’s Formula#

Enter the Roots of Unity#

A Concrete Example: The 4x6 Board#

The Massive Multiplication#

The Floating-Point Trap, Part 2#

Enter the Cyclotomic Polynomial#

The Cost of Exactness#

The Code: Generation 3#