Chapter 35

Decode Is Not Free

In Chapter 34, we fetched the bytes. We have 0x48 0x89 0xE5 sitting in the Fetch Buffer.

But the Execution Units (ALUs) don't speak x86. They speak a secret, simpler internal language called Micro-ops (µops).

Before anything runs, we must Translate content from the complex public language (ISA) to the simple private language (µops). This happens in the Decoder.

The Funnel

Modern CPUs are "Superscalar"—they can run 4, 6, or 8 things at once.

But the Decoder is often a narrow funnel. It might only be able to decode 4 instructions per cycle.

If your instruction is "Simple" (like add), it flows through 1:1. If it is "Complex" (like a string copy), it clogs the funnel, taking multiple cycles to explode into component µops.

It doesn't matter how many ALUs you have. If the Decoder cannot feed them, they sit idle. A starved backend looks identical to a slow CPU.

Physics Lens: Throughput = Width / Complexity. CISC tax is real.

Experiment:
1. Add Simple: Fill the queue with Simple instructions (Green). Click "Clock Step". Notice 4 move through at once. IPC = 4.0.
2. Add Complex: Fill the queue with Complex instructions (Red). Click "Clock Step". Only 1 moves. The bottleneck is the translation logic. IPC = 1.0.

The Micro-op Cache (µop Cache)

Since decoding is expensive (energy and time), modern CPUs cheat.

Once they decode a block of instructions into µops, they save the result in a secret cache called the µop Cache (or DSB).

If you run a loop that fits in the µop Cache, you skip the Decoder entirely. You get "free" frontend bandwidth. This is why small, hot loops are unreasonably fast.

Hypotheses

Is ARM faster because it's RISC?

Partly. ARM instructions are fixed-width (usually 4 bytes) and simple. They are trivial to decode in parallel. x86 instructions are variable length (1 to 15 bytes)-finding where one ends and the next begins is a nightmare for the decoder.

We have the µops.

The instructions are decoded. We know what they want to do.

But often, what they want to do depends on the future. "If X > 0, Jump here."

We can't wait for X to be calculated. We must guess.

It is time to look at the History Tables.