Nehalem revisited

Nehalem EP is almost identical to Nehalem, with only a couple of notable changes. The first of the two main differences is the inclusion of an additional QPI link. This allows one Xeon 5500-series CPU to directly communicate with another and should make any applications running on either CPU much faster, as the CPUs can share information between their Level 3 caches or even jump across the use each other's DDR3 memory.

Secondly, as Xeons are intended for use in workstations and servers, the four pre-fetchers of the Xeon 5500-series are optimised for applications that are commonly used in these setups. By contrast, the pre-fetchers of a Core i7 CPU are optimised for consumer applications and games and, as a result, the W5580 performs slightly differently to the Core i7-965 Extreme Edition, even though both CPUs are clocked at 3.2GHz.

The Xeon 5500-series is still a major change from earlier Core-architecture Xeons. For example, Nehalem EP has the same improved macro-fusion as Nehalem, which allows more instructions to be combined into a single micro-op. Other shared core improvements with Nehalem include a widening of the loop stream detector from 18 to 28 loops and a more accurate branch predictor.

The reservation station has the same enhancement as Nehalem, widening from 32 to 36 stages, which necessitates the increase in the number of load buffers from 32 to 48, and the number of store buffers from 20 to 32. Together, these improvements mean that a Xeon 5580-series CPU can process up to 128 micro-ops simultaneously, compared to the 96 of earlier Core-based Xeons.

Intel Xeon W5580: Nehalem EP Nehalem revisitedIntel Xeon W5580: Nehalem EP Nehalem revisited
As you can see from the picture on the right, the new kid on the block, the Xeon W5580 (left) physically dominates the older Xeon X5482 (right), but can it dominate it in our benchmarks?

Like Nehalem, Nehalem EP also supports Hyper-Threading, a technology that utilises the spare resources of each execution unit to run a second thread in parallel. This means that a dual-processor system with two quad-core W5580s has eight physical cores and a further eight logical cores.

Unlike Core architecture quad-core CPUs, which are made from two dual-core dies, Nehalem and Nehalem EP CPUs are made from a single piece of silicon. Each execution core is equipped with 32KB of instruction cache and 32KB data cache (collectively known as the Level 1 cache), plus a further 256KB of Level 2 cache.

Each core also has access to a large shared 8MB of Level 3 cache. Unlike the Level 3 cache of AMD Phenom CPUs, the Level 3 cache of the Nehalem and Nehalem EP architecture is ‘inclusive’. This means that the Level 2 cache of each execution core is copied in Level 3 cache so that if an execution core needs data on which another execution core is working, that data can be found and fetched from the Level 3 cache without the need to interrogate and stall the other execution cores.

As each Nehalem EP CPU has its own integrated memory controller, each CPU has its own bank of memory, just as NUMA-capable Opterons have had for years. Interestingly, although the Xeon W5580 is compatible with some desktop Core i7 motherboards, which use standard unbufffered DDR3, you have to use ECC registered DDR3 in a dual-processor LGA1366 motherboard. Each CPU can be fed by a single DIMM, but will perform best with three DIMMs running together in triple-channel mode. This means that you’ll need at least six DIMMs for a dual-processor system to obtain the benefit of triple-channel memory.

The Xeon W5580 also has exactly the same LGA1366 packaging as Core i7 CPUs, so you can use the same coolers. It even works in some Core i7 motherboards, such as the MSI Eclipse SLI. For more details on the Nehalem architecture see our earlier article.
Discuss this in the forums
Aquaceras Part 10 - Making Custom DDC Heatsinks

February 26 2021 | 22:15