Set-up, Rasterisation, Pixel Shader & Output Merger:
The remaining portions of the pipeline haven’t really changed a lot, although there are some refinements to improve things ever so slightly. During the Set-up and Rasterisation stage of the pipeline, Microsoft has added support for predicated rendering or drawing to reduce CPU overheads during this stage.
Many objects in 3D scenes will typically overlap each other, resulting in a partially or fully occluded object that can eat up resources if it’s processed any further than it needs to be. Modern GPU’s take care of most of this with hardware-based culling in the Z-buffer and that has already been improved upon with the implementation of a per-pixel Early Z cull. Early Z essentially tests the Z values of pixels before they enter the pixel shader – this alleviates most of the pointless processing time that is wasted on pixels that would never be visible in the final rendered scene. However, even with Early Z, there will be occasions where redundant overdraw still occurs.
Predicated draw expands on Early Z to further reduce redundant overdraw and improve efficiency. The technique allows the developer to draw complex objects using a simple box approximation that could potentially save a lot of unneeded processing time – if the box has no effect on the final image; the complex object inside the box is not drawn. In previous versions of DirectX, this required both the CPU and GPU to determine which objects should and shouldn’t be drawn – DirectX 10 eliminates all CPU intervention from this process.
Predicated Draw in action - click to enlarge In the Output Merger (as Microsoft terms it), the architects have added support for more render targets, up from four to eight – this will allow for much more complex shader programmes. Multiple render targets essentially allow the developer to render more than one pixel colour value to different surfaces with a single draw operation. Along with that Microsoft has introduced two new HDR formats that allow for more efficient use of video memory.
The first format is known as R11G11B10 and is optimised for use as a floating point render target – if you haven’t guessed from the name, it uses 11-bits for the red and green channels, while using 10-bits for the blue channel. The second format is a shared-exponent format known as RGBE and is designed for floating point textures. It uses 9-bit mantissas for red, green and blue channels and has a 5-bit shared exponent that is used across all colour channels.
Shader Model 4.0:
We have covered most of the Shader Model 4.0 features already, but there are some areas that we still haven’t covered. Microsoft created a table showing the progression of Shader Models since Shader Model 1.1 in 2001, right the way up to Shader Model 4.0 – the foundations of DirectX 10.
One thing that we haven’t really touched on yet is the amount of resources available to the API. DirectX 10 has over a hundred times more resources than the number available in DirectX 9 – this should help to eradicate the resource limitations that developers often bumped into when programming for DirectX 9. Register space has been massively increased by over two orders of magnitude: temporary registers have increased from 32 to 4096, while constant registers have increased from 256 to 65,536.
When using large numbers of constants for things like the position and colour of lighting, the camera position, view and projection matrices, and other parameters, developers often bumped into the ever-problematic CPU overhead in DirectX 9. DirectX 10 makes use of large constant buffers that group constants together updating them based on their frequency of use.
With Shader Model 4.0, there are a total of 16 constant buffers available to the developer for each shader programme, and each buffer is capable of holding up to 4096 constants. Grouping constants into a single buffer means that you can update all of them using one call to the API.
Organising constants by frequency of use also makes for much more efficient use of the buffers. Constants that are likely to change regularly include the position of the camera and the view projection matrix – both change every frame. On the other hand, constants that don’t change very often include things like material parameters for a texture, which may only change on a per-primitive basis. The flexibility for updating different constants based on their usage scenario cuts the CPU overhead associated with updating constants down significantly.
Want to comment? Please log in.