Chapter 13

Translation Costs

We have talked about memory as if we interact with it directly. We assumed that valid logical pointers map directly to metal.

They don't.

Every address your program sees is a **Virtual Address**. It is a lie. Before the CPU can fetch even a single byte, it must translate this Virtual Address into a **Physical Address**.

This translation happens for every single instruction that touches memory.

The Hidden Tax

To perform this translation, the CPU consults a "Page Table" capable of mapping Gigabytes of RAM. But the Page Table is huge. It lives in slow RAM.

If we had to read RAM to find out where RAM is, we would double our latency.

To survive, the CPU uses a tiny cache called the TLB (Translation Lookaside Buffer). It remembers recent translations.

Physics Lens: Latency ↓ (Hidden by TLB) | Throughput ↑ (If Hits) | Energy ↑ (Table Walk) | Waste ↑ (Page Faults)

Experiment:
1. Sequential Access: You stay on the same "Page" for a long time. The TLB Hits. Fast.
2. Random Access: You are jumping to new Pages constantly. The TLB Misses. The CPU must "Walk" the Page Table (Slow).

The Cost of Fragmentation

This is the final nail in the coffin for Linked Lists and pointer-heavy structures.

A Linked List doesn't just suffer from Cache Misses (data not loaded). It suffers from TLB Misses (address unknowns).

If your nodes are scattered across thousands of memory pages, the TLB overflows. The CPU spends more time looking up *where* data is than actually reading it. A TLB miss is not just slow — it stalls the pipeline while the page walk completes.

Hypotheses

Why can’t virtual addresses just equal physical addresses?

Because isolation would vanish. One bug could overwrite the OS, other processes, or secrets. Virtual memory is a security boundary first, a performance cost second.

Why is the TLB so small?

Because it must be incredibly fast. A large TLB would itself become slow and defeat its purpose.

Is a TLB miss worse than a cache miss?

Often, yes. A TLB miss can trigger multiple memory accesses before the real data fetch even begins.

Why do huge pages help performance?

They reduce translation pressure. Fewer pages means fewer TLB entries needed, which stabilizes execution.

What programmers usually get wrong here

They treat pointer-heavy code as "just memory access." In reality, it’s address lookup plus data fetch, multiplied by fragmentation.

This works — until we share. We have conquered latency, granularity, prediction, and translation. But we are still alone. What happens when two cores want the same memory?