HSA: Unifying the CPU and GPU
Heterogeneous devices (where two or more different processing elements are embedded on the same chip) are incredibly prevalent. Even excluding smartphones, tablets and consoles, the vast majority of PCs shipped today are powered by an APU. While this is great for saving space, money and power, the way the CPU and GPU interact with each other typically remains unchanged in such devices, despite their physical proximity. A consequence of this is, as AMD sees it, wasted potential, and its solution with Kaveri is HSA: Heterogeneous System Architecture.
HSA refers to any computing design that enables the two processing elements (CPU and GPU here) to work together more directly at the hardware level, resulting in a lot less of the overhead that typically comes from sharing data between the two. It's designed to make both elements equal in the way workloads are processed and launched, and to make directly available the resources of one to the other.
Click to enlarge - Kaveri brings with it new HSA features
Beginning with Kaveri, AMD will refer to both the CPU and GPU cores in its HSA devices under the umbrella term 'Compute Cores' (while still making clear how many of each a given product has), so the quad-core A10-7850K with its eight Compute Units has 12 Compute Cores. A Compute Core is defined as an HSA-enabled hardware block that is programmable and capable of running at least process independently with its own virtual memory space.
Related to all this is the new Khronos OpenCL 2.0 programming specification, which has many features designed to benefit heterogeneous compute devices. Thanks to its HSA features, Kaveri is the world's first chip to fully support the standard.
One way of uniting the CPU and GPU is through hUMA (heterogeneous Uniform Memory Access). While AMD's APU northbridge has been unified since Trinity, hUMA means that in Kaveri both processors have equal access to the entire system memory (up to 32GB), rather than their address spaces being sectioned off at boot. This means it's no longer necessary to constantly copy (and convert) data back and forth between the separate buffers so that each processor can work with it. In OpenCL, this is known as Shared Virtual Memory, and it's particularly beneficial when working with large data structures (e.g. trees, linked lists, object recognition) that might otherwise exceed the GPU's buffer and slow things down.
Click to enlarge - Both the CPU and GPU cores have equal access to the system memory
A second feature is known as hQ, or heterogeneous Queuing. It's also known as Dynamic Parallelism, and essentially allows both processors to create and dispatch work directly to the other when needed. With hQ, the GPU can invoke CPU functions from within its own kernel (and vice versa), rather than relying on separate kernels, host-device interactions and software layers (OS, APIs, driver stacks, etc.), all of which consume extra resources. AMD says this will enable new algorithms to be developed that make better and more efficient use of the two processor types.
A third feature, called Platform Atomics, helps to ensure that workloads are synchronised. Each processor is “aware” of what the other is working on, and this is necessary when sharing a memory space so that one processor doesn't read data that another is still modifying.
AMD's commitment to HSA, OpenCL and parallel processing acceleration in general is clear outside of Kaveri as well. Along with other players like Samsung, ARM and Qualcomm, it's a key member of both the HSA Foundation and the Khronos Group. It works with Java as well as open source libraries like OpenCV and Bolt to implement OpenCL and GPU accelerated functions. It also provides developers with its Accelerated Parallel Processing (APP) and Media SDKs, as well as its Code XL tool suite with hardware profilers and kernel debuggers. This makes a lot of sense – AMD's hardware improvements and its HSA features will be worth very little without developers to utilise them.
Click to enlarge - Examples of existing applications which leverage heterogeneous compute functionality to some degree
AMD is convinced that parallel processing and HSA are the way forward, listing almost endless examples of the sorts of areas it thinks could benefit from it (e.g. biometric gestures, speech and facial recognition, augmented reality etc.). Of course, there are numerous open source products that leverage heterogeneous compute functions today, such as VLC, GIMP, WinZip, LibreOffice. However, though AMD says legacy OpenCL products will still benefit from Kaveri, it will still be some time before we really begin to see the benefits of its HSA features, and as ever it will still be down to developers to know what tasks are best suited to what cores and to write efficient code.
Want to comment? Please log in.