Technical Guide Thermal Management Jan 2026

AI Data Center
Cooling Requirements

Modern AI accelerators generate unprecedented heat loads that fundamentally challenge traditional cooling. From 700W H200 GPUs to 1,400W MI355X processors, proper thermal management is critical for performance, reliability, and efficiency.

99.5% Power → Heat
50 W/cm² Liquid Threshold
1.56 Industry Avg PUE
1.08 Best-in-Class PUE
Air Cooling
≤35 kW/rack
Liquid Cooling
50-100 kW/rack
Immersion
100-250 kW/rack

GPU Thermal Design Power (TDP)

Modern AI accelerators have dramatically increasing power requirements, with TDPs rising from 300W (V100) to over 1,400W (MI355X)

NVIDIA GPU TDP Evolution

NVIDIA GPU Thermal Design Power Specifications
GPU Model Architecture TDP Memory Release
NVIDIA V100 Volta 300W 32GB HBM2 2017
NVIDIA A100 SXM Ampere 400W 80GB HBM2e 2020
NVIDIA H100 SXM Hopper 700W 80GB HBM3 2022
NVIDIA H100 PCIe Hopper 350W 80GB HBM3 2022
NVIDIA H200 SXM Hopper 700W 141GB HBM3e 2023
NVIDIA B200 Blackwell 1,000W 192GB HBM3e 2024
NVIDIA B300 Blackwell Ultra 1,100W 288GB HBM3e 2025

AMD Instinct GPU TDP Specifications

AMD Instinct GPU Thermal Design Power Specifications
GPU Model Architecture TDP Memory Bandwidth
AMD MI250X CDNA 2 560W 128GB HBM2e 3.2 TB/s
AMD MI300X CDNA 3 750W 192GB HBM3 5.3 TB/s
AMD MI325X CDNA 3 1,000W 256GB HBM3e 6 TB/s
AMD MI355X CDNA 4 1,400W 288GB HBM3e 8 TB/s

Heat Output Calculations

Single GPU (700W H200)

Heat Output = 700W × 0.995 = 696.5W thermal

BTU/hr = 696.5W × 3.412 = 2,377 BTU/hr per GPU

8-GPU Server (HGX H200)

GPU heat alone = 8 × 2,377 = 19,016 BTU/hr

Total server heat (with CPU, memory, networking):

≈ 25,000-30,000 BTU/hr

ASHRAE Thermal Guidelines

ASHRAE TC 9.9 publishes the industry-standard thermal guidelines for data processing environments

Temperature Classes

Class Temperature Range Use Case
A1 15-32°C (59-90°F) Enterprise servers
A2 10-35°C (50-95°F) Some servers, storage
A3 5-40°C (41-104°F) Extended envelope
A4 5-45°C (41-113°F) Maximum allowable
H1 5-25°C (41-77°F) High-density GPUs

Recommended Ranges

For optimal reliability and component lifespan

  • Inlet Temperature 18-27°C (64-80°F)
  • Relative Humidity 40-60% RH
  • Dew Point 5.5-15°C (42-59°F)
  • Max Rate of Change 20°C/hour
  • Altitude Derating -1°C per 300m above 900m
H1 Envelope: Introduced 2021 specifically for high-density AI systems requiring narrower temperature bands

Air Cooling Technologies

Foundation cooling systems for data centers up to 35 kW per rack

CRAC Units

Refrigerant-Based

Computer Room Air Conditioner uses refrigerant-based direct expansion (DX) cooling with compressor-driven refrigerant circulation.

  • Capacity: Up to 100 kW per unit
  • Best for: <200 kW electrical load
  • PUE Impact: +0.3-0.5 to base
Limitations: Lower efficiency than CRAH, compressor requires significant energy, limited scalability for high-density.

CRAH Units

Chilled Water

Computer Room Air Handler uses chilled water from central plant, blowing air over chilled water coils for heat transfer.

  • Capacity: 200 kW+ per unit
  • Best for: >200 kW load
  • Advantages: Higher efficiency at scale
Benefits: Better humidity control, supports waterside economizers, variable fan speed for demand-based cooling.

Aisle Containment

Efficiency Boost

Hot or cold aisle containment prevents mixing of supply and exhaust air, dramatically improving cooling efficiency.

  • Cold Aisle: Up to 30% savings
  • Hot Aisle: Up to 40% savings
  • Density Boost: 5 kW → 10+ kW/rack
Hot Aisle Preferred: Room stays comfortable, better for free cooling, higher return temps enable efficient chiller operation.

Air Cooling Capacity by Rack Density

Rack Power Cooling Approach Containment
1-5 kW Room-based CRAC/CRAH Optional
5-10 kW Row-based cooling Recommended
10-20 kW In-row cooling + containment Required
20-35 kW Hot aisle containment + supplemental Required
35+ kW Liquid cooling required

Direct-to-Chip Liquid Cooling

Cold plates placed directly on processors for superior heat transfer—essential for modern AI GPUs

How Direct-to-Chip Cooling Works

  1. Cold plates attach directly to processor packages
  2. Coolant (water/glycol or dielectric) flows through micro-channels
  3. Heat transfers from chip to coolant via conduction
  4. CDU (Coolant Distribution Unit) exchanges heat to facility water
  5. Facility system rejects heat to outside environment

Heat Capture Efficiency

Cooling Method Heat Capture
Traditional air cooling 100% to room air
Direct-to-chip (single phase) 50-75% to liquid
Direct-to-chip (two phase) 70-95% to liquid
Immersion cooling 95-100% to liquid

Single-Phase Cooling

Most Common
  • Coolant: Treated water, water/glycol, dielectric
  • Flow Rate: 1-2 L/min per chip, 1.5 L/min/kW
  • PUE: 1.15-1.25
  • Complexity: Lower cost, simpler design
Coolant remains liquid throughout cycle

Two-Phase (P2P) Cooling

Premium Efficiency
  • Coolant: Refrigerants (R-515B, R-134a), engineered fluids
  • Capacity: Up to 170 kW IT load
  • PUE: 1.02-1.10
  • Savings: Up to 82% cooling energy reduction
Leverages latent heat of vaporization for superior heat transfer

Coolant Distribution Units (CDU)

CDUs are the central control systems managing temperature, pressure, and flow for liquid cooling.

CDU Type Capacity Heat Rejection Best For
Liquid-to-Air Up to 70 kW Room air Retrofits, no chilled water
Liquid-to-Liquid 100-1,350 kW Facility water New builds, high density
In-Rack Up to 100 kW Varies Distributed deployment
In-Row 200-600 kW Facility water Row-based cooling

Rear Door Heat Exchangers (RDHx)

Replaces standard rack rear door with liquid-cooled heat exchanger—ideal for retrofits

How RDHx Works

  1. Hot exhaust air exits servers through rear of rack
  2. Air passes through heat exchanger coils in rear door
  3. Chilled water in coils absorbs heat from air
  4. Cooled air exits into room at or below ambient
  5. Warmed water returns to CDU or chiller

RDHx Types

Passive RDHx

  • Relies on server fan airflow
  • No additional power required
  • Best for: 15-30 kW per rack

Active RDHx

  • Built-in fans for higher capacity
  • Consumes additional power
  • Best for: 30-75+ kW per rack

RDHx Specifications

Cooling Capacity 30-75 kW per door (up to 100 kW active)
Heat Removal 70-100% of rack heat load
PUE Achievable 1.25-1.35 (with free cooling: 1.03-1.10)
Water Temperature 7-20°C supply

Key Benefits

  • Retrofit-friendly: Mounts on existing racks with minimal changes
  • Eliminates containment need: Creates neutral room temperature
  • Hybrid compatible: Works with existing CRAC/CRAH systems
  • Incremental: Add to high-density racks as needed

Immersion Cooling

Submerges entire servers in dielectric fluid for maximum thermal transfer—supporting 100-250 kW per rack

Single-Phase Immersion

Servers submerged in dielectric fluid bath. Heat transfers via convection, warm fluid circulates to CDU.

  • Capacity: Up to 100-200 kW per rack
  • Chip TDP Support: Up to 1,000W per chip
  • PUE: 1.02-1.03
  • Fluid: Synthetic hydrocarbon, white/mineral oil
  • Fluid Cost: $10-50/gallon

Advantages

  • 95%+ cooling power reduction vs. air
  • No fans required (eliminates vibration)
  • Dust and contaminant protection
  • Reduced noise (<50 dB)

Two-Phase Immersion

Premium

Low-boiling-point fluid boils at chip surface (34-50°C). Vapor rises to condenser, returns as liquid—passive circulation.

  • Capacity: Up to 252 kW per tank
  • Chip TDP Support: Up to 1,000W+ per chip
  • PUE: 1.01-1.02
  • Fluid: Engineered fluorocarbons (3M Novec)
  • Fluid Cost: $100s/gallon

Advantages

  • Highest heat transfer efficiency
  • No pumps for coolant circulation
  • Zero water consumption possible
  • 2x heat rejection of single-phase

Immersion Cooling Comparison

Attribute Single-Phase Two-Phase
PUE 1.02-1.03 1.01-1.02
Coolant Cost Lower ($10-50/gal) Higher ($100s/gal)
System Complexity Simpler More complex
Maintenance Access Open tank Sealed system
Heat Density Up to 200 kW/rack Up to 250+ kW/rack
Water Usage Can be zero Zero

Water Quality Requirements

Water quality is critical for liquid cooling system reliability and longevity

ASHRAE Water Quality Recommendations

Parameter FWS (Facility) TCS (Technology)
pH 7.0-8.5 7.0-8.5
Conductivity <1,500 μS/cm <5-10 μS/cm
Chloride <250 ppm <5 ppm
Sulfate <250 ppm <10 ppm
Total Suspended Solids <100 ppm <3 ppm
Hardness <200 ppm CaCO₃ <50 ppm

Filtration Requirements

Loop Filter Size Purpose
FWS (primary) 150-500 micron Prevent large debris
TCS (secondary) 25-50 micron Protect cold plates
Micro-channel plates Down to 10 micron Prevent clogging

Chemical Treatment

  • Corrosion inhibitors: Azole compounds, 100 ppm typical
  • Biocides: Isothiazolone (aerobic), glutaraldehyde (anaerobic)
  • pH stabilizers: Maintain 7.0-8.5 range
  • Scale inhibitors: Prevent mineral deposits

Power Usage Effectiveness (PUE)

PUE = Total Facility Energy / IT Equipment Energy — the primary metric for data center efficiency

Industry PUE Benchmarks (2024)

Data Center Type Average PUE
Industry average (Uptime Institute 2024) 1.56
Colocation facilities 1.30-1.60
Enterprise data centers 1.50-1.80
Edge computing sites 1.50-2.00
Hyperscale (Google, Meta) 1.08-1.10
Best-in-class hyperscale 1.05-1.06

PUE by Cooling Technology

Cooling Technology Typical PUE Best
Traditional air (no economizer) 1.60-2.00 1.50
Air with economizers 1.30-1.50 1.20
Air with containment 1.30-1.45 1.20
RDHx with free cooling 1.25-1.35 1.03
Direct-to-chip liquid 1.15-1.25 1.10
Single-phase immersion 1.03-1.10 1.02
Two-phase immersion 1.02-1.05 1.01

Cooling System Selection Guide

Match your rack density and requirements to the optimal cooling technology

Cooling Selection by Rack Density
Rack Density Primary Cooling Secondary Notes
<5 kW Room-based CRAC/CRAH None Traditional enterprise
5-15 kW Row-based CRAH Containment Most common today
15-30 kW In-row + containment RDHx Transitional density
30-50 kW RDHx or direct-to-chip Air supplemental AI/HPC starting point
50-100 kW Direct-to-chip liquid Minimal air AI training clusters
100+ kW Immersion cooling None required Maximum density

Technology Comparison Matrix

Criteria Air Cooling RDHx Direct-to-Chip Immersion
Max rack density 20-35 kW 50-75 kW 100+ kW 200+ kW
Capital cost Lowest Medium Medium-High Highest
Operating cost Highest Medium Low Lowest
Retrofit difficulty N/A Low Medium High
Water usage High (evap) Medium Low Zero possible
PUE potential 1.30-1.60 1.20-1.35 1.10-1.25 1.01-1.10

Frequently Asked Questions

Common questions about AI data center cooling

What is the TDP of modern AI GPUs like H200, B200, and B300?

Modern AI GPU TDPs have increased dramatically: NVIDIA H100/H200 SXM at 700W, NVIDIA B200 at 1,000W, NVIDIA B300 at 1,400W, AMD MI300X at 750W, and AMD MI355X at 1,400W. An 8-GPU HGX H200 server generates approximately 25,000-30,000 BTU/hr of heat including CPUs, memory, and networking components.

When should I use liquid cooling instead of air cooling for AI GPUs?

Air cooling becomes inadequate above 50 W/cm² heat flux or approximately 35 kW per rack. Modern AI GPUs like H100 at 86 W/cm² exceed this threshold. For rack densities above 35 kW, direct-to-chip liquid cooling is required. For densities above 100 kW per rack, immersion cooling provides the best solution with potential PUE as low as 1.02.

What PUE can I achieve with different cooling technologies?

PUE varies significantly by technology: Traditional air achieves 1.50-1.80, air with economizers 1.30-1.50, RDHx with free cooling 1.20-1.35, direct-to-chip liquid 1.10-1.25, single-phase immersion 1.02-1.10, and two-phase immersion can achieve 1.01-1.05. Best-in-class hyperscalers like Google achieve 1.08-1.10.

What are the ASHRAE temperature recommendations for GPU data centers?

ASHRAE recommends inlet temperatures of 18-27°C (64-80°F) for optimal reliability. Class A1 allows 15-32°C, while the H1 envelope (for high-density GPU systems) requires 5-25°C with narrower tolerances. Relative humidity should be maintained at 40-60% with dew point between 5.5-15°C.

What is the difference between single-phase and two-phase immersion cooling?

Single-phase immersion uses dielectric fluid that remains liquid, capturing 95%+ heat with PUE of 1.02-1.03 and lower fluid costs ($10-50/gallon). Two-phase immersion uses fluids that boil at chip surfaces (typically 34-50°C), leveraging latent heat for superior efficiency with PUE as low as 1.01, but requires more expensive engineered fluids ($100s/gallon) and sealed systems.

What water quality is required for direct-to-chip liquid cooling?

For Technology Cooling Systems (TCS), ASHRAE recommends pH 7.0-8.5, conductivity below 10 μS/cm for high-density AI, chloride below 5 ppm, total suspended solids below 3 ppm, and 25-50 micron filtration. Facility Water Systems (FWS) have less stringent requirements with 150-500 micron filtration. Corrosion inhibitors and biocides are essential.

Conversion & Calculation Reference

Essential formulas and conversions for cooling system design

Heat Conversion

  • Watts to BTU/hr: W × 3.412
  • BTU/hr to Watts: BTU/hr ÷ 3.412
  • Tons to kW: Tons × 3.517
  • kW to Tons: kW ÷ 3.517

Temperature

  • °C to °F: (°C × 9/5) + 32
  • °F to °C: (°F - 32) × 5/9
  • L/min to GPM: L/min × 0.264
  • GPM to L/min: GPM × 3.785

Cooling Capacity

Required Cooling (kW) = Total Heat Load (kW) × Safety Factor

Safety Factor: 1.2-1.3 (20-30% margin)

Example: 1 MW IT load × 1.25 = 1,250 kW cooling

Airflow Requirements

CFM = (kW × 3.412) / (1.08 × ΔT)

ΔT = Temperature rise in °F (typically 15-25°F)

10 kW rack, 20°F rise = 1,580 CFM

Get Expert Cooling Assessment

Our mechanical engineers will analyze your AI workload requirements and design the optimal cooling solution with detailed specifications, vendor recommendations, and cost estimates.

Free Thermal Analysis
Vendor Introductions
PUE Optimization
Turnkey Solutions
Reconnecting to the server...

Please wait while we restore your connection

An unhandled error has occurred. Reload 🗙