AI Data Center
Cooling Requirements
Modern AI accelerators generate unprecedented heat loads that fundamentally challenge traditional cooling. From 700W H200 GPUs to 1,400W MI355X processors, proper thermal management is critical for performance, reliability, and efficiency.
GPU Thermal Design Power (TDP)
Modern AI accelerators have dramatically increasing power requirements, with TDPs rising from 300W (V100) to over 1,400W (MI355X)
NVIDIA GPU TDP Evolution
| GPU Model | Architecture | TDP | Memory | Release |
|---|---|---|---|---|
| NVIDIA V100 | Volta | 300W | 32GB HBM2 | 2017 |
| NVIDIA A100 SXM | Ampere | 400W | 80GB HBM2e | 2020 |
| NVIDIA H100 SXM | Hopper | 700W | 80GB HBM3 | 2022 |
| NVIDIA H100 PCIe | Hopper | 350W | 80GB HBM3 | 2022 |
| NVIDIA H200 SXM | Hopper | 700W | 141GB HBM3e | 2023 |
| NVIDIA B200 | Blackwell | 1,000W | 192GB HBM3e | 2024 |
| NVIDIA B300 | Blackwell Ultra | 1,100W | 288GB HBM3e | 2025 |
AMD Instinct GPU TDP Specifications
| GPU Model | Architecture | TDP | Memory | Bandwidth |
|---|---|---|---|---|
| AMD MI250X | CDNA 2 | 560W | 128GB HBM2e | 3.2 TB/s |
| AMD MI300X | CDNA 3 | 750W | 192GB HBM3 | 5.3 TB/s |
| AMD MI325X | CDNA 3 | 1,000W | 256GB HBM3e | 6 TB/s |
| AMD MI355X | CDNA 4 | 1,400W | 288GB HBM3e | 8 TB/s |
Heat Output Calculations
Single GPU (700W H200)
Heat Output = 700W × 0.995 = 696.5W thermal
BTU/hr = 696.5W × 3.412 = 2,377 BTU/hr per GPU
8-GPU Server (HGX H200)
GPU heat alone = 8 × 2,377 = 19,016 BTU/hr
Total server heat (with CPU, memory, networking):
≈ 25,000-30,000 BTU/hr
ASHRAE Thermal Guidelines
ASHRAE TC 9.9 publishes the industry-standard thermal guidelines for data processing environments
Temperature Classes
| Class | Temperature Range | Use Case |
|---|---|---|
| A1 | 15-32°C (59-90°F) | Enterprise servers |
| A2 | 10-35°C (50-95°F) | Some servers, storage |
| A3 | 5-40°C (41-104°F) | Extended envelope |
| A4 | 5-45°C (41-113°F) | Maximum allowable |
| H1 | 5-25°C (41-77°F) | High-density GPUs |
Recommended Ranges
For optimal reliability and component lifespan
- Inlet Temperature 18-27°C (64-80°F)
- Relative Humidity 40-60% RH
- Dew Point 5.5-15°C (42-59°F)
- Max Rate of Change 20°C/hour
- Altitude Derating -1°C per 300m above 900m
Air Cooling Technologies
Foundation cooling systems for data centers up to 35 kW per rack
CRAC Units
Refrigerant-BasedComputer Room Air Conditioner uses refrigerant-based direct expansion (DX) cooling with compressor-driven refrigerant circulation.
- Capacity: Up to 100 kW per unit
- Best for: <200 kW electrical load
- PUE Impact: +0.3-0.5 to base
CRAH Units
Chilled WaterComputer Room Air Handler uses chilled water from central plant, blowing air over chilled water coils for heat transfer.
- Capacity: 200 kW+ per unit
- Best for: >200 kW load
- Advantages: Higher efficiency at scale
Aisle Containment
Efficiency BoostHot or cold aisle containment prevents mixing of supply and exhaust air, dramatically improving cooling efficiency.
- Cold Aisle: Up to 30% savings
- Hot Aisle: Up to 40% savings
- Density Boost: 5 kW → 10+ kW/rack
Air Cooling Capacity by Rack Density
| Rack Power | Cooling Approach | Containment |
|---|---|---|
| 1-5 kW | Room-based CRAC/CRAH | Optional |
| 5-10 kW | Row-based cooling | Recommended |
| 10-20 kW | In-row cooling + containment | Required |
| 20-35 kW | Hot aisle containment + supplemental | Required |
| 35+ kW | Liquid cooling required | — |
Direct-to-Chip Liquid Cooling
Cold plates placed directly on processors for superior heat transfer—essential for modern AI GPUs
How Direct-to-Chip Cooling Works
- Cold plates attach directly to processor packages
- Coolant (water/glycol or dielectric) flows through micro-channels
- Heat transfers from chip to coolant via conduction
- CDU (Coolant Distribution Unit) exchanges heat to facility water
- Facility system rejects heat to outside environment
Heat Capture Efficiency
| Cooling Method | Heat Capture |
|---|---|
| Traditional air cooling | 100% to room air |
| Direct-to-chip (single phase) | 50-75% to liquid |
| Direct-to-chip (two phase) | 70-95% to liquid |
| Immersion cooling | 95-100% to liquid |
Single-Phase Cooling
Most Common- Coolant: Treated water, water/glycol, dielectric
- Flow Rate: 1-2 L/min per chip, 1.5 L/min/kW
- PUE: 1.15-1.25
- Complexity: Lower cost, simpler design
Two-Phase (P2P) Cooling
Premium Efficiency- Coolant: Refrigerants (R-515B, R-134a), engineered fluids
- Capacity: Up to 170 kW IT load
- PUE: 1.02-1.10
- Savings: Up to 82% cooling energy reduction
Coolant Distribution Units (CDU)
CDUs are the central control systems managing temperature, pressure, and flow for liquid cooling.
| CDU Type | Capacity | Heat Rejection | Best For |
|---|---|---|---|
| Liquid-to-Air | Up to 70 kW | Room air | Retrofits, no chilled water |
| Liquid-to-Liquid | 100-1,350 kW | Facility water | New builds, high density |
| In-Rack | Up to 100 kW | Varies | Distributed deployment |
| In-Row | 200-600 kW | Facility water | Row-based cooling |
Rear Door Heat Exchangers (RDHx)
Replaces standard rack rear door with liquid-cooled heat exchanger—ideal for retrofits
How RDHx Works
- Hot exhaust air exits servers through rear of rack
- Air passes through heat exchanger coils in rear door
- Chilled water in coils absorbs heat from air
- Cooled air exits into room at or below ambient
- Warmed water returns to CDU or chiller
RDHx Types
Passive RDHx
- Relies on server fan airflow
- No additional power required
- Best for: 15-30 kW per rack
Active RDHx
- Built-in fans for higher capacity
- Consumes additional power
- Best for: 30-75+ kW per rack
RDHx Specifications
Key Benefits
- Retrofit-friendly: Mounts on existing racks with minimal changes
- Eliminates containment need: Creates neutral room temperature
- Hybrid compatible: Works with existing CRAC/CRAH systems
- Incremental: Add to high-density racks as needed
Immersion Cooling
Submerges entire servers in dielectric fluid for maximum thermal transfer—supporting 100-250 kW per rack
Single-Phase Immersion
Servers submerged in dielectric fluid bath. Heat transfers via convection, warm fluid circulates to CDU.
- Capacity: Up to 100-200 kW per rack
- Chip TDP Support: Up to 1,000W per chip
- PUE: 1.02-1.03
- Fluid: Synthetic hydrocarbon, white/mineral oil
- Fluid Cost: $10-50/gallon
Advantages
- 95%+ cooling power reduction vs. air
- No fans required (eliminates vibration)
- Dust and contaminant protection
- Reduced noise (<50 dB)
Two-Phase Immersion
PremiumLow-boiling-point fluid boils at chip surface (34-50°C). Vapor rises to condenser, returns as liquid—passive circulation.
- Capacity: Up to 252 kW per tank
- Chip TDP Support: Up to 1,000W+ per chip
- PUE: 1.01-1.02
- Fluid: Engineered fluorocarbons (3M Novec)
- Fluid Cost: $100s/gallon
Advantages
- Highest heat transfer efficiency
- No pumps for coolant circulation
- Zero water consumption possible
- 2x heat rejection of single-phase
Immersion Cooling Comparison
| Attribute | Single-Phase | Two-Phase |
|---|---|---|
| PUE | 1.02-1.03 | 1.01-1.02 |
| Coolant Cost | Lower ($10-50/gal) | Higher ($100s/gal) |
| System Complexity | Simpler | More complex |
| Maintenance Access | Open tank | Sealed system |
| Heat Density | Up to 200 kW/rack | Up to 250+ kW/rack |
| Water Usage | Can be zero | Zero |
Water Quality Requirements
Water quality is critical for liquid cooling system reliability and longevity
ASHRAE Water Quality Recommendations
| Parameter | FWS (Facility) | TCS (Technology) |
|---|---|---|
| pH | 7.0-8.5 | 7.0-8.5 |
| Conductivity | <1,500 μS/cm | <5-10 μS/cm |
| Chloride | <250 ppm | <5 ppm |
| Sulfate | <250 ppm | <10 ppm |
| Total Suspended Solids | <100 ppm | <3 ppm |
| Hardness | <200 ppm CaCO₃ | <50 ppm |
Filtration Requirements
| Loop | Filter Size | Purpose |
|---|---|---|
| FWS (primary) | 150-500 micron | Prevent large debris |
| TCS (secondary) | 25-50 micron | Protect cold plates |
| Micro-channel plates | Down to 10 micron | Prevent clogging |
Chemical Treatment
- Corrosion inhibitors: Azole compounds, 100 ppm typical
- Biocides: Isothiazolone (aerobic), glutaraldehyde (anaerobic)
- pH stabilizers: Maintain 7.0-8.5 range
- Scale inhibitors: Prevent mineral deposits
Power Usage Effectiveness (PUE)
PUE = Total Facility Energy / IT Equipment Energy — the primary metric for data center efficiency
Industry PUE Benchmarks (2024)
| Data Center Type | Average PUE |
|---|---|
| Industry average (Uptime Institute 2024) | 1.56 |
| Colocation facilities | 1.30-1.60 |
| Enterprise data centers | 1.50-1.80 |
| Edge computing sites | 1.50-2.00 |
| Hyperscale (Google, Meta) | 1.08-1.10 |
| Best-in-class hyperscale | 1.05-1.06 |
PUE by Cooling Technology
| Cooling Technology | Typical PUE | Best |
|---|---|---|
| Traditional air (no economizer) | 1.60-2.00 | 1.50 |
| Air with economizers | 1.30-1.50 | 1.20 |
| Air with containment | 1.30-1.45 | 1.20 |
| RDHx with free cooling | 1.25-1.35 | 1.03 |
| Direct-to-chip liquid | 1.15-1.25 | 1.10 |
| Single-phase immersion | 1.03-1.10 | 1.02 |
| Two-phase immersion | 1.02-1.05 | 1.01 |
Cooling System Selection Guide
Match your rack density and requirements to the optimal cooling technology
| Rack Density | Primary Cooling | Secondary | Notes |
|---|---|---|---|
| <5 kW | Room-based CRAC/CRAH | None | Traditional enterprise |
| 5-15 kW | Row-based CRAH | Containment | Most common today |
| 15-30 kW | In-row + containment | RDHx | Transitional density |
| 30-50 kW | RDHx or direct-to-chip | Air supplemental | AI/HPC starting point |
| 50-100 kW | Direct-to-chip liquid | Minimal air | AI training clusters |
| 100+ kW | Immersion cooling | None required | Maximum density |
Technology Comparison Matrix
| Criteria | Air Cooling | RDHx | Direct-to-Chip | Immersion |
|---|---|---|---|---|
| Max rack density | 20-35 kW | 50-75 kW | 100+ kW | 200+ kW |
| Capital cost | Lowest | Medium | Medium-High | Highest |
| Operating cost | Highest | Medium | Low | Lowest |
| Retrofit difficulty | N/A | Low | Medium | High |
| Water usage | High (evap) | Medium | Low | Zero possible |
| PUE potential | 1.30-1.60 | 1.20-1.35 | 1.10-1.25 | 1.01-1.10 |
Frequently Asked Questions
Common questions about AI data center cooling
What is the TDP of modern AI GPUs like H200, B200, and B300?
Modern AI GPU TDPs have increased dramatically: NVIDIA H100/H200 SXM at 700W, NVIDIA B200 at 1,000W, NVIDIA B300 at 1,400W, AMD MI300X at 750W, and AMD MI355X at 1,400W. An 8-GPU HGX H200 server generates approximately 25,000-30,000 BTU/hr of heat including CPUs, memory, and networking components.
When should I use liquid cooling instead of air cooling for AI GPUs?
Air cooling becomes inadequate above 50 W/cm² heat flux or approximately 35 kW per rack. Modern AI GPUs like H100 at 86 W/cm² exceed this threshold. For rack densities above 35 kW, direct-to-chip liquid cooling is required. For densities above 100 kW per rack, immersion cooling provides the best solution with potential PUE as low as 1.02.
What PUE can I achieve with different cooling technologies?
PUE varies significantly by technology: Traditional air achieves 1.50-1.80, air with economizers 1.30-1.50, RDHx with free cooling 1.20-1.35, direct-to-chip liquid 1.10-1.25, single-phase immersion 1.02-1.10, and two-phase immersion can achieve 1.01-1.05. Best-in-class hyperscalers like Google achieve 1.08-1.10.
What are the ASHRAE temperature recommendations for GPU data centers?
ASHRAE recommends inlet temperatures of 18-27°C (64-80°F) for optimal reliability. Class A1 allows 15-32°C, while the H1 envelope (for high-density GPU systems) requires 5-25°C with narrower tolerances. Relative humidity should be maintained at 40-60% with dew point between 5.5-15°C.
What is the difference between single-phase and two-phase immersion cooling?
Single-phase immersion uses dielectric fluid that remains liquid, capturing 95%+ heat with PUE of 1.02-1.03 and lower fluid costs ($10-50/gallon). Two-phase immersion uses fluids that boil at chip surfaces (typically 34-50°C), leveraging latent heat for superior efficiency with PUE as low as 1.01, but requires more expensive engineered fluids ($100s/gallon) and sealed systems.
What water quality is required for direct-to-chip liquid cooling?
For Technology Cooling Systems (TCS), ASHRAE recommends pH 7.0-8.5, conductivity below 10 μS/cm for high-density AI, chloride below 5 ppm, total suspended solids below 3 ppm, and 25-50 micron filtration. Facility Water Systems (FWS) have less stringent requirements with 150-500 micron filtration. Corrosion inhibitors and biocides are essential.
Conversion & Calculation Reference
Essential formulas and conversions for cooling system design
Heat Conversion
- Watts to BTU/hr: W × 3.412
- BTU/hr to Watts: BTU/hr ÷ 3.412
- Tons to kW: Tons × 3.517
- kW to Tons: kW ÷ 3.517
Temperature
- °C to °F: (°C × 9/5) + 32
- °F to °C: (°F - 32) × 5/9
- L/min to GPM: L/min × 0.264
- GPM to L/min: GPM × 3.785
Cooling Capacity
Required Cooling (kW) = Total Heat Load (kW) × Safety Factor
Safety Factor: 1.2-1.3 (20-30% margin)
Example: 1 MW IT load × 1.25 = 1,250 kW cooling
Airflow Requirements
CFM = (kW × 3.412) / (1.08 × ΔT)
ΔT = Temperature rise in °F (typically 15-25°F)
10 kW rack, 20°F rise = 1,580 CFM
Get Expert Cooling Assessment
Our mechanical engineers will analyze your AI workload requirements and design the optimal cooling solution with detailed specifications, vendor recommendations, and cost estimates.