Chapter 19

Hierarchy (Trees)

We have traded order for speed (Hashes). But probability cannot reason. It cannot say "Give me all users older than 18."

To regain meaning, we must organize data again. We build a Tree.

But for the machine, a tree is just a beautiful nightmare.

The Price of Meaning

A Binary Tree is a structure where every decision cuts the world in half. It is theoretically perfect: log(N) steps.

But physically, every step is a Dependent Load. You cannot find the child until you have loaded the parent.

The CPU cannot prefetch. It cannot guess. It must wait.

The Branch Penalty

Trees do not just cause cache misses. They cause branch mispredictions.

Every comparison is a fork: Left or Right? The CPU guesses. If it guesses wrong, it throws away its work and restarts the pipeline.

This is why a tree search can be slower than a linear scan even when log(N) is smaller than N.

The Cost of Maintenance

Trees are expensive to keep balanced. Every insertion may trigger node splits, pointer rewrites, cache invalidations, and coherence storms.

Trees give you meaning, but they demand constant repair.

Physics Lens: Latency ↑↑↑ (Chains) | Bandwidth ↓ (Wasted) | Coherence ↓ (Pointer Chasing)
Experiment: Compare Binary Tree vs B-Tree.
The Binary Tree forces many small jumps (Pointer Hell). The B-Tree loads a wide block of keys at once (Cache Heaven).
Fan-out beats Depth.

The Four Lenses

Let's audit the Tree.

Hypotheses
Why do databases use B-Trees?

Because accessing the Disk is slow (milliseconds). We want to load as much info as possible in one go. A B-Tree node can be 4KB (one page), holding hundreds of keys. A Binary Tree node holds 1.

Is log(N) always fast?

No. log2(N) involves many physical steps. log100(N) (B-Tree) involves very few updates. CPU cycles are cheap; Memory Latency is expensive. We spend cycles scanning a fat node to save on fetching a cold pointer.

Why do textbooks still teach Binary Search Trees?

Because they are mathematically clean and easy to draw on a whiteboard. Real machines do not operate on whiteboards. They operate on cache lines, pipelines, and pages. Binary Trees are a teaching tool, not a performance tool.

We have rebuilt the world. Structure exists. But structure does nothing by itself. To make it move, we need time, prediction, and illusion.