NVIDIA Scientific GPU Tesla A100 40GB Workstation Graphic Card
To unlock next-generation discoveries, scientists look to simulations to better understand the world around us.
NVIDIA Tesla A100 introduces double precision Tensor Cores to deliver the biggest leap in HPC performance since the introduction of GPUs. Combined with 80GB of the fastest GPU memory, researchers can reduce a 10-hour, double-precision simulation to under four hours on NVIDIA Tesla A100. HPC applications can also leverage TF32 to achieve up to 11X higher throughput for single-precision, dense matrix-multiply operations.
For the HPC applications with the largest datasets, A100 80GB’s additional memory delivers up to a 2X throughput increase with Quantum Espresso, a materials simulation. This massive memory and unprecedented memory bandwidth makes the A100 80GB the ideal platform for next-generation workloads.
A100 introduces groundbreaking features to optimize inference workloads. It accelerates a full range of precision, from FP32 to INT4. Multi-Instance GPU (MIG) technology lets multiple networks operate simultaneously on a single A100 for optimal utilization of compute resources. And structural sparsity support delivers up to 2X more performance on top of A100’s other inference performance gains.
On state-of-the-art conversational AI models like BERT, A100 accelerates inference throughput up to 249X over CPUs.
On the most complex models that are batch-size constrained like RNN-T for automatic speech recognition, A100 80GB’s increased memory capacity doubles the size of each MIG and delivers up to 1.25X higher throughput over A100 40GB.
NVIDIA’s market-leading performance was demonstrated in MLPerf Inference. A100 brings 20X more performance to further extend that leadership.
NVIDIA A100 for PCIe | |
GPU Architecture | NVIDIA Ampere |
Peak FP64 | 9.7 TF |
Peak FP64 Tensor Core | 19.5 TF |
Peak FP32 | 19.5 TF |
Peak TF32 Tensor Core | 156 TF | 312 TF* |
Peak BFLOAT16 Tensor Core | 312 TF | 624 TF* |
Peak FP16 Tensor Core | 312 TF | 624 TF* |
Peak INT8 Tensor Core | 624 TOPS | 1,248 TOPS* |
Peak INT4 Tensor Core | 1,248 TOPS | 2,496 TOPS* |
GPU Memory | 40GB |
GPU Memory Bandwidth | 1,555 GB/s |
Interconnect | PCIe Gen4 64 GB/s |
Multi-instance GPUs | Various instance sizes with up to 7MIGs @5GB |
Form Factor | PCIe |
Max TDP Power | 250W |
Delivered Performance of Top Apps | 90% |
A100 with MIG maximizes the utilization of GPU-accelerated infrastructure. With MIG, an A100 GPU can be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration. With A100 40GB, each MIG instance can be allocated up to 5GB, and with A100 80GB’s increased memory capacity, that size is doubled to 10GB.
MIG works with Kubernetes, containers, and hypervisor-based server virtualization. MIG lets infrastructure managers offer a right-sized GPU with guaranteed quality of service (QoS) for every job, extending the reach of accelerated computing resources to every user.