Chapter 10

The Hierarchy

We have spent the last few chapters pretending that "Memory" is a single, uniform bucket. We write to it, we read from it, and it obeys.

This was a lie.

In reality, Memory is a machine defined by a single, brutal constraint: Distance.

Light travels at 30 centimeters per nanosecond. A modern CPU runs at 4 GHz, meaning it ticks every 0.25 nanoseconds. In the time between two ticks, light travels only 7.5 centimeters. And inside silicon, light moves even slower.

If your data is further away than 7.5cm, the CPU physically cannot access it in one cycle. It must wait.

The Latency Ladder

To hide this distance, we build layers. Small, fast memory close to the CPU, and large, slow memory far away.

This is the Memory Hierarchy. It is not a feature. It is a desperate attempt to pretend that the universe allows random access.

Programming books often talk about "nanoseconds". But humans can't feel nanoseconds. Let's translate this into human time.

If 1 CPU Cycle was 1 Second...

Physics Lens: Latency ↑↑↑ (Distance) | Throughput ↓ (Waiting) | Energy ↑ (Moving Data) | Waste ↑ (Copies)

Experiment: Click the levels (L1, L2, RAM, Disk). See how "far" the data really is in human terms.

The Hopelessness of RAM

Look at the gap between L1 Cache (3 seconds) and Main RAM (4 minutes).

When your program has a "Cache Miss"—when it has to go to Main RAM—the CPU essentially stops working. It sits there, twiddling its thumbs, for hundreds of cycles. It is like an F1 driver stopping mid-race to get out and make a coffee.

And Disk? Disk is a journey to Mars. If your program touches the disk (Swap), it is effectively dead.

Hypotheses

Isn’t this just "cache vs RAM"? Why make it dramatic?

Because most programmers still think latency is a number, not a distance. If you don’t internalize distance, you will keep designing algorithms that teleport data. The drama is the correction.

Why can’t CPUs just wait less?

Because waiting is not a software decision. When a cache miss happens, the core is literally starved of electrons carrying information. No amount of clever code can change geometry.

If RAM is so slow, why don’t we just put everything in cache?

Because cache is built from SRAM, and SRAM is physically huge. If L1 cache were the size of RAM, your CPU would be the size of a city block and melt instantly.

Is latency the only problem?

No. Latency is the first problem. Bandwidth, energy, and contention show up immediately after. Distance creates all of them.

What programmers usually get wrong here

They think "fast computer" means "fast memory." In reality, memory is always slow. CPUs survive by lying to you.

This works — until we miss. We have built a hierarchy to hide the latency. But this introduces a new problem: Granularity. You can't just fetch one byte anymore.