Performance Optimization Guide
Advanced techniques to maximize performance, increase efficiency, and deliver superior service quality on your infrastructure.
Maximum Performance
System-level optimizations to squeeze every bit of performance from your hardware.
Resource Efficiency
CPU, memory, storage, and network tuning for optimal resource utilization.
Revenue Impact
Better performance means happier customers and increased earnings.
System-Level Optimization
Fundamental kernel and CPU optimizations
Start with fundamental system optimizations that benefit all workloads.
Kernel Optimization (/etc/sysctl.conf)
# Network performance
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq
# Memory management
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.overcommit_memory = 1
kernel.numa_balancing = 1
# File system
fs.file-max = 2097152
fs.aio-max-nr = 1048576
# Security and performance
kernel.pid_max = 4194304
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
CPU Governor Settings
# Set performance governor for all CPUs
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo performance > $cpu
done
# Disable CPU frequency scaling
systemctl disable ondemand
# Intel-specific: Disable C-states for lower latency
echo 1 > /sys/module/intel_idle/parameters/max_cstate
# AMD-specific: Enable CPPC for better performance
echo active > /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preferenceCPU Performance Optimization
NUMA configuration and CPU isolation
Maximize CPU performance for compute-intensive workloads.
NUMA Optimization
For multi-socket systems, proper NUMA configuration is critical:
# Check NUMA topology
numactl --hardware
# Set NUMA memory policy
numactl --cpunodebind=0 --membind=0 your-application
# Monitor NUMA statistics
numastat -c
CPU Isolation
Dedicate CPU cores to instances for predictable performance:
# Reserve CPUs for instances (edit /etc/default/grub)
GRUB_CMDLINE_LINUX="isolcpus=4-31 nohz_full=4-31 rcu_nocbs=4-31"
# Update grub
sudo update-grub
# Reboot required for changes to take effectMemory Optimization
Huge pages and memory management
Optimize memory allocation and performance for better instance density.
Transparent Huge Pages (THP)
# Enable THP for better performance (general workloads)
echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag
# For KVM/virtualization - disable THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# Configure static hugepages for VMs
echo 1024 > /proc/sys/vm/nr_hugepages
mount -t hugetlbfs hugetlbfs /dev/hugepages
Memory Deduplication (KSM)
# Enable Kernel Samepage Merging
echo 1 > /sys/kernel/mm/ksm/run
echo 1000 > /sys/kernel/mm/ksm/sleep_millisecs
# Monitor KSM effectiveness
cat /sys/kernel/mm/ksm/pages_shared
cat /sys/kernel/mm/ksm/pages_sharingStorage Performance Tuning
I/O schedulers and filesystem optimization
Optimize storage subsystem for maximum IOPS and throughput.
I/O Scheduler by Drive Type
NVMe SSD
echo none > /sys/block/nvme0n1/queue/scheduler
No scheduler needed - NVMe handles queuing internally
SATA SSD
echo noop > /sys/block/sda/queue/scheduler
Minimal overhead for solid-state drives
HDD
echo deadline > /sys/block/sdb/queue/scheduler
Better for rotational media with seek time
Filesystem Mount Options
# EXT4 performance mount options
/dev/sda1 /storage ext4 noatime,nodiratime,discard,data=writeback 0 0
# XFS optimization for large files
mkfs.xfs -f -d agcount=32 -l size=512m /dev/sdb1Network Performance Tuning
NIC settings and TCP/IP stack optimization
Optimize network stack for low latency and high throughput.
Network Interface Tuning
#!/bin/bash
INTERFACE="eth0"
# Increase ring buffers
ethtool -G $INTERFACE rx 4096 tx 4096
# Enable offloading features
ethtool -K $INTERFACE rx on tx on sg on tso on gso on gro on lro on
# Set interrupt coalescing
ethtool -C $INTERFACE adaptive-rx on adaptive-tx on rx-usecs 10
# Enable jumbo frames (if supported by network)
ip link set dev $INTERFACE mtu 9000
TCP/IP Stack Tuning
# TCP optimization
sysctl -w net.ipv4.tcp_fastopen=3
sysctl -w net.ipv4.tcp_mtu_probing=1
sysctl -w net.ipv4.tcp_low_latency=1
# Enable BBR congestion control
modprobe tcp_bbr
sysctl -w net.ipv4.tcp_congestion_control=bbr
# Increase socket buffers
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
# Connection tracking for high-traffic servers
sysctl -w net.netfilter.nf_conntrack_max=1048576GPU Optimization
NVIDIA and multi-GPU configuration
Maximize GPU performance for compute and AI workloads.
NVIDIA GPU Performance Settings
# Set GPU to persistence mode (reduces startup latency)
nvidia-smi -pm 1
# Set performance mode (example for V100)
nvidia-smi -ac 1215,1410 # Memory,Graphics clocks
# Set power limit to maximum (adjust per GPU model)
nvidia-smi -pl 300 # Watts
# Configure GPU for compute mode
nvidia-smi -c EXCLUSIVE_PROCESS
# Monitor GPU performance
nvidia-smi dmon -s pucvmet
Multi-GPU Configuration
# Check GPU topology and NVLink status
nvidia-smi topo -m
# Check NVLink connectivity (if available)
nvidia-smi nvlink -s
nvidia-smi nvlink -g 0
# Enable peer-to-peer access for multi-GPU workloads
# (Usually automatic, verify with cuda-samples p2pBandwidthLatencyTest)
GPU optimization settings may vary by model. Always test changes with your specific workloads before deploying to production.
Container Performance
LXD and container runtime optimization
Optimize container runtime for better instance performance.
LXD Performance Tuning
# Storage pool optimization with ZFS
lxc storage create fast zfs source=/dev/nvme0n1
lxc storage set fast zfs.use_refquota true
lxc storage set fast volume.zfs.remove_snapshots true
# CPU limits configuration
lxc profile set default limits.cpu.priority 5
lxc profile set default limits.cpu.allowance 100%
# Memory limits
lxc profile set default limits.memory.enforce soft
lxc profile set default limits.memory.swap false
# Network optimization
lxc network set lxdbr0 bridge.mtu 9000Performance Monitoring
Essential monitoring commands and tools
Continuous monitoring is essential for maintaining optimal performance.
Performance Monitoring Toolkit
# CPU performance
mpstat -P ALL 1 # Per-CPU statistics
perf top # Real-time function profiling
turbostat --interval 1 # CPU frequency and power
# Memory performance
vmstat 1 # Virtual memory statistics
slabtop # Kernel slab cache usage
# Storage performance
iostat -x 1 # Disk I/O statistics
iotop -o # Per-process I/O usage
# Network performance
sar -n DEV 1 # Network device statistics
ss -s # Socket statistics summary
Performance Troubleshooting
Common issues and solutions
Common performance issues and their solutions.
High CPU Wait Time
High %wa in top, slow response times
- Check I/O with
iostat -x 1 - Identify heavy I/O processes with
iotop - Move high I/O workloads to faster storage
- Enable writeback caching if data loss is acceptable
Memory Pressure
High swap usage, OOM kills
- Enable KSM for memory deduplication
- Adjust overcommit ratios
- Implement memory limits per instance
- Add more physical RAM
Network Latency
High ping times, packet loss
- Check for IRQ conflicts
- Enable interrupt coalescing
- Distribute interrupts across CPUs
- Upgrade network drivers
Optimization Best Practices
Guidelines for sustained high performance
Follow these guidelines for sustained high performance.
Baseline First
Always measure performance before and after changes to quantify improvements.
Incremental Changes
Make one optimization at a time for clear results and easy rollback.
Document Settings
Keep detailed records of all optimizations applied to each server.
Regular Reviews
Revisit optimizations as workloads and software change over time.
Important: Some optimizations trade stability or data safety for performance. Always understand the implications before applying changes to production systems.
