Scalable infrastructure that puts you in the lead for AI
Here’s the NVIDIA DGX A100 SuperPOD system reference architecture to help you deploy your AI and deep learning infrastructure.
The NVIDIA DGX SuperPOD with NVIDIA DGX A100 systems is the next-generation artificial intelligence (AI) supercomputing infrastructure, providing the computational power necessary to train today’s state-of-the-art deep learning (DL) models and to fuel innovation well into the future.
The DGX SuperPOD delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world’s most challenging computational problems.
This DGX SuperPOD reference architecture is the result of codesign between DL scientists, application performance engineers, and system architects to build a system capable of supporting the widest range of DL workloads. A supercomputer built using this reference architecture earned the seventh spot on the June 2020 TOP500 list.
In July 2020, the supercomputer set world records in all 8 of the at-scale benchmarks in MLPerf v0.7 Training and the NVIDIA A100 Tensor Core GPU set 16 records overall in the commercially available systems category.
This design introduces compute building blocks called scalable units (SU) allowing for the modular deployment of a full 140-node DGX SuperPOD, which can further scale to hundreds of nodes.
Here’s the full reference architecture, including data center configurations for power, cooling and racks.