Nvidia joined the 3-trillion market valuation club in June earlier this year, outranking the likes of Apple and Microsoft. This astronomical growth has been possible due to its dominance in the GPU and AI hardware space. However, Nvidia is not the only company making chips for today’s growing AI workloads. Many companies, such as Intel, Google, Amazon, and others, are working on custom silicon for training and inferencing AI models. So, let’s look at promising Nvidia competitors in the AI hardware space.
AMD: A Rising Contender in AI Hardware
When it comes to high-performing AI accelerators, AMD is up there competing against Nvidia, both in terms of training and inference. While analysts suggest that Nvidia has a market share of 70% to 90% in the AI hardware space, AMD has started putting its house in order.
AMD introduced its Instinct MI300X accelerator for AI workloads and HPC (High Performance Computing) in December 2023. AMD claims that its Instinct MI300X accelerator delivers 1.6x better performance than Nvidia H100 in inference and almost similar performance in training.
Not only that, it offers a capacity of up to 192GB HBM3 memory (High-Bandwidth Memory), much higher than Nvidia H100’s 80GB capacity. MI300X delivers a memory bandwidth of up to 5.3 TBps, again higher than H100’s 3.4 TBps.
However, AMD still has a long way to go before it establishes itself as a major rival to Nvidia. The answer to this lies in software. Nvidia’s moat is CUDA, the computing platform that allows developers to directly interact with Nvidia GPUs for accelerated parallel processing.
The CUDA platform has a large number of libraries, SDKs, toolkits, compilers, and debugging tools, and it’s supported by popular deep learning frameworks such as PyTorch and TensorFlow. On top of that, CUDA has been around for nearly two decades, and developers are more familiar with Nvidia GPUs and their workings, especially in the field of machine learning.
That said, AMD is investing heavily in the ROCm (Radeon Open Compute) software platform, which supports PyTorch, TensorFlow, and other open frameworks. The company has also decided to open-source some portion of the ROCm software stack, although developers have criticized ROCm for offering a fragmented experience and a lack of comprehensive documentation.
Intel's Strategic Push in AI Accelerators
Many analysts are writing off Intel from the AI chip space, but Intel has been one of the leaders in inferencing with its CPU-based Xeon servers. The company recently launched its Gaudi 3 AI accelerator, which is an ASIC (Application-Specific Integrated Circuit) chip that is not based on traditional CPU or GPU design. It offers both training and inference for Generative AI workloads.
Intel claims the Gaudi 3 AI accelerator is 1.5x faster at training and inference than Nvidia H100. Its Tensor Processor Cores (TPC) and MME Engines are specialized for matrix operations which are required for deep learning workloads.
As for software, Intel is going the open-source route with OpenVINO and its own software stack. The Gaudi software suite integrates frameworks, tools, drivers, and libraries, supporting open frameworks like PyTorch and TensorFlow.
Intel, along with Google, Arm, Qualcomm, and others, has formed a group called the Unified Acceleration Foundation (UXL). This group aims to create an open-source alternative to Nvidia’s proprietary CUDA software platform, which will prevent developers from getting locked into Nvidia’s ecosystem.
Google: Dominating with Custom TPUs
If there is an AI giant that is not reliant on Nvidia, it’s Google. Google has been developing its in-house TPU (Tensor Processing Unit) since 2015 on ASIC design. Its powerful TPU v5p is 2.8x faster than Nvidia H100 at training AI models and is highly efficient at inference.
At the Google Cloud Next 2024 event, it was confirmed that Google’s Gemini model was trained entirely on the TPU, which is significant in showcasing TPU's capabilities. Google offers its TPU through its cloud service for various AI workloads, making it a true rival to Nvidia.
Unlike Microsoft, Google is not over-reliant on Nvidia and has also introduced its Axion processor, which is an Arm-based CPU. This new processor delivers unrivaled efficiency for data centers and can handle CPU-based AI training and inferencing as well.
Google supports frameworks like JAX, Keras, PyTorch, and TensorFlow out of the box, giving it a strong edge in software support for AI researchers and developers.
Amazon's Custom Chips for AI Workloads
Amazon runs AWS (Amazon Web Services), which offers cloud-computing platforms for businesses and enterprises. To cater to companies for AI workloads, Amazon has developed two custom ASIC chips for training and inferencing: AWS Trainium and AWS Inferentia.
AWS Trainium can handle deep-learning training for up to 100B models, while AWS Inferentia is optimized for AI inferencing. The main goal of these custom chips is to provide low cost and high performance, allowing Amazon to stake a claim in the AI hardware space.
Amazon also integrates its chips with its own AWS Neuron SDK, which supports popular frameworks like PyTorch and TensorFlow, thus making it easier for developers to deploy AI models.
Microsoft's Custom AI Efforts
Similar to Google, Microsoft is ramping up its custom silicon efforts. In November 2023, Microsoft introduced its MAIA 100 chip for AI workloads and Cobalt 100 (Arm-based CPU) for Azure cloud infrastructure. This move reflects Microsoft's goal to avoid costly over-reliance on Nvidia.
The MAIA 100 chip is developed on an ASIC design and is specifically used for AI inferencing and training. Reportedly, the MAIA 100 chip is currently being tested for GPT-3.5 Turbo inferencing.
Microsoft's partnership with Nvidia and AMD for its cloud infrastructure needs will be crucial as the company deploys its custom silicon more widely.
Qualcomm's Cloud AI Solutions
Qualcomm released its Cloud AI 100 accelerator in 2020 for AI inferencing but saw limited success. The company refreshed it with the Cloud AI 100 Ultra in November 2023, which is custom-built for generative AI applications.
This chip can handle 100B parameter models on a single card with a TDP of just 150W, making it highly efficient. Qualcomm is mainly focused on inferencing rather than training, emphasizing power efficiency.
Hewlett Packard Enterprise (HPE) is using the Qualcomm Cloud AI 100 Ultra to power generative AI workloads on its servers, and Qualcomm has partnered with Cerebras for end-to-end model training and inferencing.
Cerebras and Groq: The New Players in AI Hardware
Apart from established companies, startups like Cerebras are making waves in the AI hardware space. Their Wafer-Scale Engine 3 (WSE-3) can handle models up to 24 trillion parameters, significantly more than Nvidia's offerings.
Cerebras is targeting mega-corporations looking to build efficient AI systems by eliminating the need for distributed computing. The company has gained clients like AstraZeneca and The Mayo Clinic, making it a significant player.
Groq has also emerged as a contender with its LPU (Language Processing Unit) accelerator. This chip is designed for generative AI applications and is noted for being cost-effective compared to Nvidia’s GPUs.
Groq's LPU achieves impressive performance benchmarks, generating 300 to 400 tokens per second while running the Llama 3 70B model, making it the second fastest AI inferencing solution available.
Looking Ahead: The Future of AI Hardware
The AI hardware landscape is rapidly evolving with numerous chipmakers entering the fray to challenge Nvidia's dominance. Companies like SambaNova offer training-as-a-service, but quantifiable benchmarks are still awaited. Meanwhile, Tenstorrent is pivoting towards RISC-V based IP licensing for its chip designs.
As the industry shifts towards custom silicon and purpose-built AI accelerators, the competition is expected to intensify. Nvidia remains the preferred choice for training due to the wide adoption of CUDA, but the future may see more specialized accelerators maturing to challenge this status quo.
Overall, the AI landscape is transforming, and the rise of diverse AI accelerators indicates a paradigm shift that could redefine how AI models are trained and deployed.