Chapter 39

Arithmetic Intensity

We often believe that algorithms are slow because they do "too much computation."

This is a lie. ALUs are astonishingly fast. Memory is astonishingly slow.

Most algorithms are not computing. They are Waiting.

The Formula: Ops per Byte

The fundamental metric of performance is not Big-O. It is Arithmetic Intensity (AI).

AI = FLOPs / Bytes Fetched

If you fetch 8 bytes (a double) and do 1 addition, your AI is 0.125. You are Memory Bound. The ALUs spend 99% of their time idle.

To utilize a modern CPU or GPU, you need an AI of 10, 20, or 50. You must reuse every byte you touch dozens of times.

Physics Lens: Reuse Data = Computation is Free.

Experiment:
1. Vector Add (1 Op/Byte): Memory is red-lining (100%). Compute is barely awake. You are blocked by bandwidth.
2. Matrix Mul (16 Ops/Byte): Memory is relaxed. Compute is red-lining (100%). You are finally using the silicon you paid for.

Why GPUs Win

GPUs are not "Magic." They simply demand High Intensity.

Deep Learning works on GPUs because Matrix Multiplication (and Convolutions) allow for massive data reuse. You load a weight once, and multiply it against 1000 pixels.

If you tried to run a Linked List traversal on a GPU, it would be slower than a 1990s CPU. It has no reuse.

Hypotheses

So Big-O doesn't matter?

It matters for scaling (N=1,000 vs N=1,000,000). But for Performance (Seconds), Arithmetic Intensity matters more. An O(N^2) algorithm with high reuse might run faster than an O(N) algorithm that thrashes memory, for small N.

Why is Python so slow then?

Because Python has very low Arithmetic Intensity. Every addition requires dereferencing pointers, type-checking objects, and fetching attributes. It fetches 100 bytes of interpreter overhead to do 1 byte of math.

The fastest code does not do less work.

It does more work per byte.

Writing code that achieves this high intensity is incredibly hard. It requires blocking, tiling, stride-awareness, and vectorization.

Luckily, someone has already done it for you.

It is time to look at Libraries.