NVIDIA Grace Blackwell Superchip
The Grace Blackwell Superchip unifies NVIDIA's Grace CPU with dual Blackwell GPUs on a single platform. Connected via 900GB/s NVLink-C2C, this architecture eliminates CPU-GPU bottlenecks and delivers 20 petaflops of AI performance with unified memory addressing for next-generation AI inference and training.
Why Grace Blackwell Superchip?
Unified CPU+GPU architecture eliminates traditional system bottlenecks
Grace Blackwell Configurations
Choose the right superchip for your workload requirements
NVL Rack-Scale Systems
Scale to rack-level with liquid-cooled, high-density deployments
- GPUs: 36 Blackwell GPUs
- CPUs: 18 Grace CPUs
- Memory: ~7TB HBM3e
- NVLink: 65TB/s aggregate
- GPUs: 72 Blackwell GPUs
- CPUs: 36 Grace CPUs
- Memory: ~14-21TB HBM3e
- NVLink: 130TB/s aggregate
- FP4 Perf: 720 PFLOPS
Technical Specifications
Complete Grace Blackwell Superchip specifications
| Specification | GB200 | GB300 |
|---|---|---|
| Configuration | ||
| CPU | 1x Grace (72 Arm cores) | 1x Grace (72 Arm cores) |
| GPUs | 2x B200 | 2x B300 |
| Interconnect | NVLink-C2C 900GB/s | NVLink-C2C 900GB/s |
| Memory | ||
| GPU Memory | 384GB HBM3e (192GB x2) | 576GB HBM3e (288GB x2) |
| Memory Bandwidth | 16TB/s | ~24TB/s |
| CPU Memory | Up to 480GB LPDDR5X | Up to 480GB LPDDR5X |
| Performance | ||
| FP4 Tensor | 20 PFLOPS | ~30 PFLOPS |
| FP8 Tensor | 10 PFLOPS | ~15 PFLOPS |
| FP16 Tensor | 5 PFLOPS | ~7.5 PFLOPS |
| Connectivity | ||
| NVLink (GPU-GPU) | 1.8TB/s per GPU | 1.8TB/s per GPU |
| External Network | ConnectX-7 (400Gb) | ConnectX-7 (400Gb) |
| Power | ||
| TDP | ~2700W | ~3000W |
| Cooling | Liquid Required | Liquid Required |
Ideal Use Cases
Grace Blackwell excels at unified CPU+GPU workloads
LLM Inference
Unified memory eliminates data movement between CPU and GPU. Run trillion-parameter models with minimal latency for real-time inference at scale.
RAG Pipelines
Retrieval-augmented generation benefits from coherent memory. CPU handles retrieval while GPUs run inference with seamless data handoff.
Real-Time AI
Sub-millisecond CPU-GPU latency enables real-time autonomous systems, robotics control, and interactive AI applications.
AI Factories
NVL72 rack-scale systems consolidate infrastructure for massive AI training and inference clusters with optimized power efficiency.
Scientific Computing
Unified addressing simplifies HPC codes. Run molecular dynamics, climate modeling, and physics simulations without memory management complexity.
Edge AI at Scale
High-density form factor enables AI inference at network edge with full datacenter-class performance in constrained environments.
Architecture Comparison
Grace Blackwell vs traditional discrete GPU systems
GB200 vs HGX B200
- CPU-GPU Link: 900GB/s vs ~128GB/s PCIe
- Memory: Unified addressing vs separate
- Latency: 7X lower CPU-GPU
- Density: NVL72 vs HGX 8-GPU
- Power: Optimized per FLOP
GB200 vs GB300
- GPU Memory: 384GB vs 576GB (+50%)
- Bandwidth: 16TB/s vs ~24TB/s
- FP4 Perf: 20 PFLOPS vs ~30 PFLOPS
- Architecture: Same Grace Blackwell
- Availability: Now vs H2 2025
Frequently Asked Questions
What is the NVIDIA Grace Blackwell Superchip?
The Grace Blackwell Superchip (GB200/GB300) integrates an NVIDIA Grace CPU with two Blackwell GPUs on a unified platform. The components are connected via 900GB/s NVLink-C2C, providing 7X lower latency than PCIe and enabling unified memory addressing between CPU and GPU.
What is the difference between GB200 and GB300?
GB200 pairs the Grace CPU with two B200 GPUs (384GB HBM3e total, 20 PFLOPS FP4). GB300 "Ultra" pairs Grace with two B300 GPUs (576GB HBM3e total, ~30 PFLOPS FP4). GB300 provides 50% more memory for the largest AI models while maintaining the same integrated architecture.
What is the NVL72 configuration?
NVL72 is a rack-scale system containing 36 Grace Blackwell Superchips (72 Blackwell GPUs + 36 Grace CPUs). It provides 130TB/s aggregate NVLink bandwidth, 14-21TB of HBM3e memory, and 720 PFLOPS of FP4 performance. NVL36 offers a half-rack configuration with 36 GPUs.
Should I choose GB200 or discrete B200 GPUs?
Choose GB200 for inference workloads, RAG pipelines, and applications that benefit from unified CPU-GPU memory. Choose discrete B200 (HGX systems) for pure GPU training workloads or when you need maximum flexibility in server configuration. GB200 offers 7X lower CPU-GPU latency.
When is GB200/GB300 available?
GB200 systems began shipping in Q4 2024 with broader availability in 2025. GB300 Ultra is expected in H2 2025. Contact SLYD for current availability through our OEM partnerships with Dell, HPE, Supermicro, Lenovo, and Gigabyte.
What cooling infrastructure is required?
Grace Blackwell Superchips require liquid cooling infrastructure due to the ~2700-3000W TDP per superchip. NVL72 racks include integrated liquid cooling. SLYD provides infrastructure design and deployment support for liquid-cooled AI systems.
Deploy NVIDIA Grace Blackwell
Get enterprise GB200/GB300 systems through our OEM partnerships with expert configuration and deployment support.