Today, at its Data Center and AI Technology event in San Francisco, Advanced Micro Devices (AMD) announced the availability of its 4th Gen EPYC data center processor expansion based on the company’s Bergamo and Genoa-X CPU architectures for cloud-native, general-purpose compute and HPC (High Performance Computing) workloads, respectively. The company also announced it will sample its new Instinct MI300X AI accelerator later this year, along with its Infinity Architecture platform that incorporates up to eight MI300X accelerators together for complex AI inferencing and training workloads. In addition, the MI300A accelerator, which incorporates CPU and GPU cores, is already sampling to partners. However, what might be most compelling about the announcements AMD made today, is that the company has laid the foundation for a comprehensive platform strategy for accelerated computing solutions that are specifically optimized for the major growth markets of cloud computing, HPC and AI.
AMD 4th Gen EPYC 97X4 Bergamo CPUs Offer Huge Core Density Per Socket Advantage
In modern data centers, compute resources per square millimeter of rack space and performance-per-watt are critical markers of performance, capacity and efficiency. It was with these metrics in mind that AMD today announced full availability of its EPYC 97X4 series processors, formerly known by the codename “Bergamo.” The company is branding this family of data center CPUs as “cloud native,” and though that terminology is being sprinkled around a bit too liberally these days, AMD’s Bergamo Zen 4c architecture is indeed optimized for dense cloud computing environments. With up to 128 cores and 256 threads per socket, Bergamo delivers what AMD claims could equate to over 2X more application containers per server at scale, versus Intel’s latest Sapphire Rapids 4th Gen Xeon CPUs. Note: Bergamo is also designed for up to 2P socket scalability, for a total of 256 cores and up to 512 threads per server.
As such, Bergamo expands on AMD’s core density per CPU socket advantage over Intel’s Sapphire Rapids Xeon product offering which currently tops out at 60 core chips, though Intel does offer up to 8 socket configurations on some models at a considerable power and space trade-off, obviously. Intel also has its Xeon CPU Max line with integrated HBM (High Bandwidth Memory) for essentially DRAM-less server designs, so the competitive landscape is nuanced for sure, but generally speaking, AMD’s core density per socket advantage is a huge win for cloud hyperscalers that simply need more core resources per server and rack.
Put another way, illustrated below, AMD claims it would require dramatically fewer 2P AMD EPYC 9754 servers to hit the same NGINX web throughput as competitive solutions from Intel or Ampere, with potentially 50% less power consumed or more and much lower total cost of ownership (TCO).
At a high level, Bergamo has the same L1 and L2 cache complement (up to 1MB per core L2, 128MB total) but less L3 shared cache at up to 256MB total, versus AMD’s Genoa CPU arch at 384MB. That smaller shared cache is also divided amongst more cores at 16MB per CCX (Core Complex), versus 32MB per CCX in the traditional Zen 4 Genoa design. Clock speeds are scaled back a bit as well, with base clocks as low as 2.2GHz and boost at 3.1GHz, whereas Genoa tops out at 3.7GHz. Regardless, AMD’s 35 percent smaller Zen 4C core allows it to pack in up to 128-cores per chip (now 16-cores per core compute chiplet, with 8-per cores per CCX) and socket, along with the same 12-channel DDR5-4800 capable memory controller and 128 PCIe Gen 5 lanes, all in the same 360 – 400W max configurable TDP (Thermal Design Power) as a 96-core Genoa EPYC processor.
AMD will offer several configurations of its new 4th Gen EPYC Bergamo CPUs, including one with SMT (Symmetric Multithreading) disabled for customers that specifically want to ensure this feature is not enabled in certain product offerings. Amazon AWS VP, Dave Brown, joined Dr. Su on stage to roll out the company’s new Amazon EC2 M7a instances in the cloud, based on 4th Gen AMD EPYC processors, promising up to 50% higher performance compared to the previous-gen M6a platform. However, for applications that require even heavier lifting, AMD has teed-up another beast called Genoa-X, aka its 4th Gen EPYC processors with 3D V-Cache on board.
AMD 4th Gen EPYC Genoa-X With 3D V-Cache Enters The Technical Computing Fray
Comprised of up to 96 AMD Zen 4 cores, 4th Gen EPYC processors with 3D V-Cache have just been unveiled as well, with up to a massive 1.1GB of L3 cache per chip. The Genoa-X platform is specifically optimized for what the company refers to as technical computing requirements of design, modeling and simulation. This market segment has as much more compute-intensive demands with massive data sets, processing and bandwidth demands that are more costly to go off-chip into system memory. With the huge additional 3D stacked L3 cache on board these CPUs, these large datasets can be accommodated more efficiently with much higher throughput as a result. Four new Genoa-X SKUs will be introduced, ranging from 16 to 96-cores that will be socket compatible to the Genoa and Bergamo SP5 platform. Microsoft Azure HPC and AI GM, Nidhi Chappell joined AMD’s Server Business Unit GM and SVP, Dan McNamara on stage to announce the general availability of Azure HBv4/HX series virtual machines for high performance compute applications. Azure’s Chappell made claims up of up to 4.5X HPC throughput in applications like computational fluid dynamics, and up to a 6X speed-up in silicon design or structural analysis applications, gen-on-gen.
Data Center AI Acceleration – AMD’s Instinct Line-Up Expands To Address A Massive TAM
There’s little question that the field of artificial intelligence has experienced explosive growth in the past year or so especially, with large language models powering AI chatbots and generative AI rendering anything from poetry and prose to impressive visual artwork. In fact, AMD CEO Lisa Su and the company estimates the total data center AI addressable market opportunity to be north of $150B by 2027 with a 50% CAGR. To that end, AMD is bolstering its AI accelerator offerings with two new powerful solutions dubbed MI300A and MI300X, based on the company’s CDNA 3 GPU architecture for accelerating AI workloads.
AMD’s MI300A is a combo CPU+GPU design, with 24 Zen 4 CPU cores on board with an AMD CDNA 3 GPU with 128GB of HMB3 memory in a single package and unified memory access across CPU and GPU resources, which can be particularly efficient in HPC workloads. AMD notes MI300A has been slated for deployment in the El Capitan supercomputer at Lawrence Livermore National Labs in California. The company claims it’s the most complex processor it has ever built with over 146 billion transistors across a 13 chiplet design.
In addition, there will be a GPU-only version of MI300 called MI300X. This GPU is designed to go toe to toe with NVIDIA’s H100 Hopper AI accelerator, and is optimized for large language models and generative AI. MI300X will have a total of 192GB of HBM3 memory on board delivering 5.2 TB/s of bandwidth and is comprised of 153 billion transistors across a 12 chiplet design. With MI300X, AMD is claiming a 2.4X HBM density and a 1.6X HBM bandwidth advantage over NVIDIA H100, which can afford larger models to be accommodated on chip for much better throughput. In fact, AMD then delivered a live running demo of the 40 billion parameter Hugging Face Falcon 40B LLM (large language model) on a single MI300X GPU. Finally, the company noted its new MI300A APU is sampling to key customers now, with its MI300X GPU and an eight-GPU AMD Instinct server platform based on it, sampling in Q3 this year.
To wrap things up a bit, AMD’s Data Center And AI Technology Premier today was a bit like drinking from the firehose. There was just too much to unpack and I will be digesting what was disclosed today with respect to the company’s various new chiplet-based big iron architectures in the weeks ahead. However, for me it’s clear that, under Dr. Su’s leadership, the company has assembled a comprehensive suite of silicon solutions for the lucrative cloud data center, high performance computing and artificial intelligence markets. These are high margin, high growth opportunities that are showing no signs of slowing down anytime soon, and in-kind AMD’s execution at the silicon level in this space, so too, appears relentless.
There will, however, be a lot of software and platform support that will be required to foster development on and adoption of these various technologies. In fact, I think that will likely be the longer pole in the tent for AMD moving forward with respect to AI especially, as the company unifies its ROCm GPU, ZenDNN CPU and Vitis AI software tools into an optimized stack for the developer ecosystem from cloud to edge. Regardless, it appears AMD has assembled a toolchest of powerful building blocks to take on the likes of Intel and NVIDIA in the modern intelligent data center.
Read the full article here