Contiguity Is Power
We have seen that Pointers are slow because they force the CPU to wait for memory. But the story is more subtle. The CPU hates fetching single bytes.
When you ask for 1 byte, the CPU actually fetches an entire block (usually 64 bytes) called a Cache Line. It grabs the neighbors, assuming you will need them soon.
This leads to the golden rule of performance: Put things next to each other.
The Array vs. The List
This is why Arrays are the fastest data structure in existence. They are contiguous.
When you read A[0], the hardware accidentally fetches A[1]..A[15]
for free.
Linked Lists are the opposite. A Linked List is a collection of pointers scattering your data across memory like confetti. To read 10 items in a list, you might trigger 10 separate cache misses. To read 10 array items, you might trigger only 1.
Data Oriented Design
This physical reality spawned a movement called Data Oriented Design (used by Unity, Unreal, etc.).
Object-Oriented Programming (OOP) encourages you to think in isolated "Objects" (often scattered in heap). Data Oriented Design says: "Stop thinking about objects. Think about tight streams of data." If you align your data with the hardware's hunger for contiguous blocks, you get 10x-100x performance improvements for free.
Why is a Cache Line 64 bytes?
It is a tradeoff. If it were smaller (e.g., 4 bytes), we would have to pay the "latency tax" too often. If it were larger (e.g., 1024 bytes), we would waste bandwidth transferring data we might not use. 64 bytes is the sweet spot for modern workloads.
Does this mean Java/C# objects are always slow?
Often, yes, due to "pointer chasing". However, modern Garbage Collectors try to move related objects closer together in memory (Compaction) to recover some spatial locality. But they can never beat a raw C array.
What is Prefetching?
The CPU notices patterns. If you access Address 100, then 104, then 108... the Prefetcher guesses you want 112 next and silently fetches it before you ask. This works perfectly for Arrays. It fails miserably for Linked Lists.
Can contiguity hurt performance?
Yes. If two threads modify different variables that live on the same cache line, they constantly invalidate each other’s cache. This is called False Sharing and it can destroy parallel performance.
What programmers usually get wrong here
Thinking "Big O" logic applies equally to lists and arrays. In theory, iterating both is O(N). In reality, the list stops for a cache miss at every step, making it 100x slower.
This works — until we scale it. Contiguity is powerful, but memory is finite. Eventually, we run out of contiguous space and must face the chaos of the Heap.