Login Register

Amazon announces high-performance Inferentia ML chip

Written by Gareth Halfacree

November 29, 2018 | 11:19

Tags: #accelerator #ai #application-specific-integrated-circuit #artificial-intelligence #asic #copper #co-processor #graviton #inference #inference-engine #inferentia #james-hamilton #machine-learning #ml

Companies: #amazon #amazon-web-services

Fresh from announcing a homebrew Arm processor, cloud leviathan Amazon Web Services has unveiled a second piece of in-house silicon: The Inferentia, focused on accelerating machine learning applications.

Traditionally, Amazon's Web Services division has focused on the use of commercial off-the-shelf (COTS) hardware to power its servers - bar a custom application-specific integrated circuit (ASIC), of its own design, added to the motherboard. Earlier this week, though, the company broke with tradition and, using technology it obtained following its acquisition of Annapurna Labs, launched the 64-bit Graviton Arm-based processor with the bold claim of a 45 percent reduction in costs for selected workloads.

Now the company has added another homebrew chip to its collection, following rival Google into designing hardware to accelerate inference engines used in machine learning: The Inferentia.

'Inference is where the work actually gets done,' explains AWS' James Hamilton in an announcement post. 'This is where speech is recognised, text is translated, object recognition in video occurs, manufacturing defects are found, and cars get driven. Inference is where the value of ML [Machine Learning] is delivered, for example, inference is what powers dozens of AWS ML Services like Amazon Rekognition Image and Video, Lex, Poly, Comprehend, Translate, Transcribe, and Amazon SageMaker Hosting.

'Inferentia offers scalable performance from 32 TOPs [trillions of operations per second] to 512 TOPS at INT8 [eight-bit precision] and our focus is on high scale deployments common to machine learning inference deployments where costs really matter. We support near-linear scale-out, and build on high volume technologies like DRAM rather than HBM. We offer INT8 for best performance, but also support mixed-precision FP16 [16-bit floating point] and bfloat16 [32-bit floating point truncated to 16-bit] for compatibility, so that customers are not forced to go through the effort of quantising their neural networks that have been trained in FP16, FP32 or bfloat16, and don’t need to compromise on which data type to use for each specific workload.'

According to Amazon's internal testing, the Inferentia chip will become available to customers of the company's Elastic Compute Cloud (EC2) some time in 2019, as an additional option to its existing GPU-accelerated machine learning instances.

Discuss this in the forums