Performance Optimization Guide

Advanced techniques to maximize performance, increase efficiency, and deliver superior service quality on your infrastructure.

Maximum Performance

System-level optimizations to squeeze every bit of performance from your hardware.

Resource Efficiency

CPU, memory, storage, and network tuning for optimal resource utilization.

Revenue Impact

Better performance means happier customers and increased earnings.

1

System-Level Optimization

Fundamental kernel and CPU optimizations

Start with fundamental system optimizations that benefit all workloads.

Kernel Optimization (/etc/sysctl.conf)

# Network performance
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq

# Memory management
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.overcommit_memory = 1
kernel.numa_balancing = 1

# File system
fs.file-max = 2097152
fs.aio-max-nr = 1048576

# Security and performance
kernel.pid_max = 4194304
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1

CPU Governor Settings

# Set performance governor for all CPUs
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance > $cpu
done

# Disable CPU frequency scaling
systemctl disable ondemand

# Intel-specific: Disable C-states for lower latency
echo 1 > /sys/module/intel_idle/parameters/max_cstate

# AMD-specific: Enable CPPC for better performance
echo active > /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
2

CPU Performance Optimization

NUMA configuration and CPU isolation

Maximize CPU performance for compute-intensive workloads.

NUMA Optimization

For multi-socket systems, proper NUMA configuration is critical:

# Check NUMA topology
numactl --hardware

# Set NUMA memory policy
numactl --cpunodebind=0 --membind=0 your-application

# Monitor NUMA statistics
numastat -c

CPU Isolation

Dedicate CPU cores to instances for predictable performance:

# Reserve CPUs for instances (edit /etc/default/grub)
GRUB_CMDLINE_LINUX="isolcpus=4-31 nohz_full=4-31 rcu_nocbs=4-31"

# Update grub
sudo update-grub

# Reboot required for changes to take effect
3

Memory Optimization

Huge pages and memory management

Optimize memory allocation and performance for better instance density.

Transparent Huge Pages (THP)

# Enable THP for better performance (general workloads)
echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag

# For KVM/virtualization - disable THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Configure static hugepages for VMs
echo 1024 > /proc/sys/vm/nr_hugepages
mount -t hugetlbfs hugetlbfs /dev/hugepages

Memory Deduplication (KSM)

# Enable Kernel Samepage Merging
echo 1 > /sys/kernel/mm/ksm/run
echo 1000 > /sys/kernel/mm/ksm/sleep_millisecs

# Monitor KSM effectiveness
cat /sys/kernel/mm/ksm/pages_shared
cat /sys/kernel/mm/ksm/pages_sharing
4

Storage Performance Tuning

I/O schedulers and filesystem optimization

Optimize storage subsystem for maximum IOPS and throughput.

I/O Scheduler by Drive Type

NVMe SSD

echo none > /sys/block/nvme0n1/queue/scheduler

No scheduler needed - NVMe handles queuing internally

SATA SSD

echo noop > /sys/block/sda/queue/scheduler

Minimal overhead for solid-state drives

HDD

echo deadline > /sys/block/sdb/queue/scheduler

Better for rotational media with seek time

Filesystem Mount Options

# EXT4 performance mount options
/dev/sda1 /storage ext4 noatime,nodiratime,discard,data=writeback 0 0

# XFS optimization for large files
mkfs.xfs -f -d agcount=32 -l size=512m /dev/sdb1
5

Network Performance Tuning

NIC settings and TCP/IP stack optimization

Optimize network stack for low latency and high throughput.

Network Interface Tuning

#!/bin/bash
INTERFACE="eth0"

# Increase ring buffers
ethtool -G $INTERFACE rx 4096 tx 4096

# Enable offloading features
ethtool -K $INTERFACE rx on tx on sg on tso on gso on gro on lro on

# Set interrupt coalescing
ethtool -C $INTERFACE adaptive-rx on adaptive-tx on rx-usecs 10

# Enable jumbo frames (if supported by network)
ip link set dev $INTERFACE mtu 9000

TCP/IP Stack Tuning

# TCP optimization
sysctl -w net.ipv4.tcp_fastopen=3
sysctl -w net.ipv4.tcp_mtu_probing=1
sysctl -w net.ipv4.tcp_low_latency=1

# Enable BBR congestion control
modprobe tcp_bbr
sysctl -w net.ipv4.tcp_congestion_control=bbr

# Increase socket buffers
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728

# Connection tracking for high-traffic servers
sysctl -w net.netfilter.nf_conntrack_max=1048576
6

GPU Optimization

NVIDIA and multi-GPU configuration

Maximize GPU performance for compute and AI workloads.

NVIDIA GPU Performance Settings

# Set GPU to persistence mode (reduces startup latency)
nvidia-smi -pm 1

# Set performance mode (example for V100)
nvidia-smi -ac 1215,1410  # Memory,Graphics clocks

# Set power limit to maximum (adjust per GPU model)
nvidia-smi -pl 300  # Watts

# Configure GPU for compute mode
nvidia-smi -c EXCLUSIVE_PROCESS

# Monitor GPU performance
nvidia-smi dmon -s pucvmet

Multi-GPU Configuration

# Check GPU topology and NVLink status
nvidia-smi topo -m

# Check NVLink connectivity (if available)
nvidia-smi nvlink -s
nvidia-smi nvlink -g 0

# Enable peer-to-peer access for multi-GPU workloads
# (Usually automatic, verify with cuda-samples p2pBandwidthLatencyTest)

GPU optimization settings may vary by model. Always test changes with your specific workloads before deploying to production.

7

Container Performance

LXD and container runtime optimization

Optimize container runtime for better instance performance.

LXD Performance Tuning

# Storage pool optimization with ZFS
lxc storage create fast zfs source=/dev/nvme0n1
lxc storage set fast zfs.use_refquota true
lxc storage set fast volume.zfs.remove_snapshots true

# CPU limits configuration
lxc profile set default limits.cpu.priority 5
lxc profile set default limits.cpu.allowance 100%

# Memory limits
lxc profile set default limits.memory.enforce soft
lxc profile set default limits.memory.swap false

# Network optimization
lxc network set lxdbr0 bridge.mtu 9000
8

Performance Monitoring

Essential monitoring commands and tools

Continuous monitoring is essential for maintaining optimal performance.

Performance Monitoring Toolkit

# CPU performance
mpstat -P ALL 1        # Per-CPU statistics
perf top               # Real-time function profiling
turbostat --interval 1 # CPU frequency and power

# Memory performance
vmstat 1               # Virtual memory statistics
slabtop                # Kernel slab cache usage

# Storage performance
iostat -x 1            # Disk I/O statistics
iotop -o               # Per-process I/O usage

# Network performance
sar -n DEV 1           # Network device statistics
ss -s                  # Socket statistics summary
Real-time Monitoring Use htop, glances, or btop for live system overview
Historical Analysis Use sar and atop for trend analysis over time
9

Performance Troubleshooting

Common issues and solutions

Common performance issues and their solutions.

High CPU Wait Time

High %wa in top, slow response times

  • Check I/O with iostat -x 1
  • Identify heavy I/O processes with iotop
  • Move high I/O workloads to faster storage
  • Enable writeback caching if data loss is acceptable

Memory Pressure

High swap usage, OOM kills

  • Enable KSM for memory deduplication
  • Adjust overcommit ratios
  • Implement memory limits per instance
  • Add more physical RAM

Network Latency

High ping times, packet loss

  • Check for IRQ conflicts
  • Enable interrupt coalescing
  • Distribute interrupts across CPUs
  • Upgrade network drivers

Optimization Best Practices

Guidelines for sustained high performance

Follow these guidelines for sustained high performance.

Baseline First

Always measure performance before and after changes to quantify improvements.

Incremental Changes

Make one optimization at a time for clear results and easy rollback.

Document Settings

Keep detailed records of all optimizations applied to each server.

Regular Reviews

Revisit optimizations as workloads and software change over time.

Important: Some optimizations trade stability or data safety for performance. Always understand the implications before applying changes to production systems.