G84's 3D Pipeline:
G84 is made up of around 289 million transistors – to put this into perspective, Nvidia’s previous high-end G71 graphics chip only has an estimated transistor count of 278 million.
Of course, it’s no match for Nvidia’s G80 behemoth, which has 691 million transistors packed inside, but then it’s not designed to be. When G80’s flood gates are fully open, there are 128 1D scalar stream processors enabled, 32 texture address units, 64 texture filtering units, 24 raster operators (ROPs) and a 384-bit memory interface connecting to 768MB of local memory.
By comparison, G84 features 32 1D scalar stream processors which, like in G80, are capable of being dynamically switched between vertex, pixel, geometry and physics calculations. These are split down into two clusters of 16 shaders, which share L1 and L2 cache with each other – nothing has really changed here.
In G80, each cluster of 16 stream processors has a texture processor associated to it. Each of these texture processors is capable of four texture addresses and eight texture filtering operations per clock cycle. Nvidia has improved G84’s texturing capabilities with slightly beefed up texture processors.
Instead of four texture addresses per clock, the two texture processors in G84 are each capable of eight texture addresses, making a theoretical maximum of 16 texture addresses every clock cycle. Texture filtering capabilities on a per-shader-cluster scale hasn’t changed though, meaning that G84 is still capable of a total of 16 bilinear texture filtering operations per clock.
The render backend hasn’t changed much between G80 and G84 either, meaning that the ROP hardware supports the same anti-aliasing modes that are supported by G80’s raster operators. These are Multi-Sampling, Super-Sampling, Transparency Adaptive AA and of course Coverage Sampling AA
– the new anti-aliasing technique introduced with G80.
Obviously though, there’s not room for six ROP partitions in G84 and it’s been shrunk down a bit. Instead, G84 features eight raster operators split into two partitions of four, which are collectively capable of writing eight pixels per clock to memory with colour and Z processing. For Z-only processing the ROP engines are capable of 32 pixels per clock if a single sample is used in each pixel.
Just like in G80, the two ROP partitions in G84 are each allocated a 64-bit memory interface (making the total memory interface 128-bits wide) and the ROP partitions are completely decoupled from the stream processors. This means that each raster operator can process pixels from any of the chip’s 32 shader processors. In order for this to work well, Nvidia uses what it terms a fragment crossbar – this essentially load balances the throughput of data.
Many of us, including myself, were hoping that Nvidia would move its mid-range graphics cards onto a 256-bit memory after the introduction of 384-bit and 320-bit memory interfaces on the GeForce 8800 GTX and GeForce 8800 GTS. Unfortunately though, Nvidia hasn’t delivered on these hopes; instead, we’re still stuck with a 128-bit memory interface on all three of the new products. This is likely to mean these cards will suffer when high levels of anti-aliasing are used in conjunction with high-resolution textures.
G86 – similar, but different:
G86 is very similar to G84 in many ways, but Nvidia has removed one of the clusters of 16 stream processors, meaning that there are only 16 stream processors inside G86. In addition, this means that texture hardware is also cut in half, too. This obviously cuts down the chip’s capabilities, but with the current flagship G86 product (GeForce 8500 GT) coming in at $89 USD, you can hardly complain.
Unlike the shading and texturing portion of the GPU, Nvidia’s engineers have left both ROP partitions in, meaning that there are still eight ROPs available. This is because of the way Nvidia’s G8x memory controller works, as each ROP partition has access to a 64-bit memory interface (six ROP partitions on G80 gives a 384-bit bus width, five ROP partitions gives 320-bits, and so on).
With only four raster operators enabled (one ROP partition) you’re left with only 64-bits of man width, which clearly isn’t going to do much for your ePeen. Thus, in order to give those self-conscious gamers the 128-bits they need to achieve the ultimate in eCredibility, there are two ROP partitions.