NVIDIAGrace Blackwell

NVIDIA Grace Blackwell Superchip

The Grace Blackwell Superchip unifies NVIDIA's Grace CPU with dual Blackwell GPUs on a single platform. Connected via 900GB/s NVLink-C2C, this architecture eliminates CPU-GPU bottlenecks and delivers 20 petaflops of AI performance with unified memory addressing for next-generation AI inference and training.

GB200
Grace + 2x Blackwell
20 PFLOPS
1 + 2 Grace CPU + Blackwell GPUs
900GB/s NVLink-C2C Interconnect
384-576GB HBM3e Memory
20 PFLOPS FP4 AI Performance

Why Grace Blackwell Superchip?

Unified CPU+GPU architecture eliminates traditional system bottlenecks

900GB/s
CPU-GPU Bandwidth
vs ~128GB/s PCIe 5.0
7X
Lower Latency
NVLink-C2C vs PCIe
Unified
Memory Addressing
CPU + GPU coherent
2X
Rack Density
NVL72 vs HGX

Grace Blackwell Configurations

Choose the right superchip for your workload requirements

GB200 Grace Blackwell
Configuration Grace + 2x B200
GPU Memory 384GB HBM3e
Bandwidth 16TB/s
FP4 Tensor 20 PFLOPS
NVLink-C2C 900GB/s
TDP 2700W
Enterprise Price Contact for Pricing
Available in NVL36 and NVL72 configurations

NVL Rack-Scale Systems

Scale to rack-level with liquid-cooled, high-density deployments

NVL36 Half-Rack Configuration
  • GPUs: 36 Blackwell GPUs
  • CPUs: 18 Grace CPUs
  • Memory: ~7TB HBM3e
  • NVLink: 65TB/s aggregate

Technical Specifications

Complete Grace Blackwell Superchip specifications

Specification GB200 GB300
Configuration
CPU 1x Grace (72 Arm cores) 1x Grace (72 Arm cores)
GPUs 2x B200 2x B300
Interconnect NVLink-C2C 900GB/s NVLink-C2C 900GB/s
Memory
GPU Memory 384GB HBM3e (192GB x2) 576GB HBM3e (288GB x2)
Memory Bandwidth 16TB/s ~24TB/s
CPU Memory Up to 480GB LPDDR5X Up to 480GB LPDDR5X
Performance
FP4 Tensor 20 PFLOPS ~30 PFLOPS
FP8 Tensor 10 PFLOPS ~15 PFLOPS
FP16 Tensor 5 PFLOPS ~7.5 PFLOPS
Connectivity
NVLink (GPU-GPU) 1.8TB/s per GPU 1.8TB/s per GPU
External Network ConnectX-7 (400Gb) ConnectX-7 (400Gb)
Power
TDP ~2700W ~3000W
Cooling Liquid Required Liquid Required

Ideal Use Cases

Grace Blackwell excels at unified CPU+GPU workloads

LLM Inference

Unified memory eliminates data movement between CPU and GPU. Run trillion-parameter models with minimal latency for real-time inference at scale.

RAG Pipelines

Retrieval-augmented generation benefits from coherent memory. CPU handles retrieval while GPUs run inference with seamless data handoff.

Real-Time AI

Sub-millisecond CPU-GPU latency enables real-time autonomous systems, robotics control, and interactive AI applications.

AI Factories

NVL72 rack-scale systems consolidate infrastructure for massive AI training and inference clusters with optimized power efficiency.

Scientific Computing

Unified addressing simplifies HPC codes. Run molecular dynamics, climate modeling, and physics simulations without memory management complexity.

Edge AI at Scale

High-density form factor enables AI inference at network edge with full datacenter-class performance in constrained environments.

Architecture Comparison

Grace Blackwell vs traditional discrete GPU systems

GB200 vs HGX B200

  • CPU-GPU Link: 900GB/s vs ~128GB/s PCIe
  • Memory: Unified addressing vs separate
  • Latency: 7X lower CPU-GPU
  • Density: NVL72 vs HGX 8-GPU
  • Power: Optimized per FLOP
Verdict: GB200 ideal for inference and unified memory workloads

GB200 vs GB300

  • GPU Memory: 384GB vs 576GB (+50%)
  • Bandwidth: 16TB/s vs ~24TB/s
  • FP4 Perf: 20 PFLOPS vs ~30 PFLOPS
  • Architecture: Same Grace Blackwell
  • Availability: Now vs H2 2025
Verdict: GB300 for largest models; GB200 for immediate deployment

Frequently Asked Questions

What is the NVIDIA Grace Blackwell Superchip?

The Grace Blackwell Superchip (GB200/GB300) integrates an NVIDIA Grace CPU with two Blackwell GPUs on a unified platform. The components are connected via 900GB/s NVLink-C2C, providing 7X lower latency than PCIe and enabling unified memory addressing between CPU and GPU.

What is the difference between GB200 and GB300?

GB200 pairs the Grace CPU with two B200 GPUs (384GB HBM3e total, 20 PFLOPS FP4). GB300 "Ultra" pairs Grace with two B300 GPUs (576GB HBM3e total, ~30 PFLOPS FP4). GB300 provides 50% more memory for the largest AI models while maintaining the same integrated architecture.

What is the NVL72 configuration?

NVL72 is a rack-scale system containing 36 Grace Blackwell Superchips (72 Blackwell GPUs + 36 Grace CPUs). It provides 130TB/s aggregate NVLink bandwidth, 14-21TB of HBM3e memory, and 720 PFLOPS of FP4 performance. NVL36 offers a half-rack configuration with 36 GPUs.

Should I choose GB200 or discrete B200 GPUs?

Choose GB200 for inference workloads, RAG pipelines, and applications that benefit from unified CPU-GPU memory. Choose discrete B200 (HGX systems) for pure GPU training workloads or when you need maximum flexibility in server configuration. GB200 offers 7X lower CPU-GPU latency.

When is GB200/GB300 available?

GB200 systems began shipping in Q4 2024 with broader availability in 2025. GB300 Ultra is expected in H2 2025. Contact SLYD for current availability through our OEM partnerships with Dell, HPE, Supermicro, Lenovo, and Gigabyte.

What cooling infrastructure is required?

Grace Blackwell Superchips require liquid cooling infrastructure due to the ~2700-3000W TDP per superchip. NVL72 racks include integrated liquid cooling. SLYD provides infrastructure design and deployment support for liquid-cooled AI systems.

Deploy NVIDIA Grace Blackwell

Get enterprise GB200/GB300 systems through our OEM partnerships with expert configuration and deployment support.

Available through:
Dell HPE Supermicro Lenovo Gigabyte
Reconnecting to the server...

Please wait while we restore your connection

An unhandled error has occurred. Reload 🗙