IKEv2 VPN Server Benchmarking: CPU Overhead and Bandwidth Throughput

As enterprises and service providers increasingly deploy VPNs to secure traffic between remote endpoints and cloud infrastructure, understanding the performance implications of protocol choices is critical. IKEv2 is a modern and widely adopted VPN control protocol that pairs with IPsec to provide strong security, resiliency, and mobility support. However, the cryptographic workload and packet processing patterns associated with IKEv2/IPsec impose measurable CPU overhead and can limit achievable bandwidth if not tuned properly. This article presents a thorough technical exploration of IKEv2 VPN server benchmarking focused on CPU overhead and bandwidth throughput, with practical measurement methodologies, key metrics, and tuning considerations for production deployments.

Why benchmark IKEv2? Key motivations

Benchmarking an IKEv2 VPN server is not just about reporting Mbps numbers. For system architects, developers, and site operators, benchmarking helps answer concrete operational questions:

How many simultaneous tunnels can a server sustain at a target throughput?
Which cryptographic suites deliver the best performance on given hardware?
What is the per-packet CPU cost and how does it scale with packet size and concurrency?
Which kernel or user-space datapath (e.g., Linux kernel IPsec vs. userland stacks like strongSwan or Strongswan’s charon with kernel helpers) provides the best trade-offs?

These answers influence capacity planning, instance sizing in cloud environments, and cost-performance trade-offs for Dedicated-IP-VPN services.

Bench setup: hardware, software, and traffic model

Reproducible benchmarking requires a clear, well-documented test harness. A typical setup includes:

Servers: One or more dedicated test servers (bare metal or cloud instances) running the VPN gateway software (e.g., strongSwan, libreswan). CPU details such as core count, frequency, microarchitecture (e.g., Intel Xeon with AES-NI) must be recorded.
Clients: Traffic generators or other servers that initiate IKEv2 SAs and send/receive encrypted IPsec traffic. Tools like iperf3, pktgen, and Scapy are common.
OS and kernel: Linux distribution and kernel version matter (IPsec processing can be in-kernel via XFRM/Netfilter or via eBPF, or in userland via VTI/TUN devices). Record kernel config features such as CONFIG_CRYPTO_AES and AF_KEY/XFRM modules.
Crypto configuration: Record IKEv2 proposals, DH groups, encryption (AES-GCM, AES-CBC, ChaCha20-Poly1305), and integrity algorithms. Also include lifetime and rekey intervals to ensure rekeying does not skew throughput tests.
Network topology: Ensure minimal background traffic, proper MTU settings, and avoid asymmetric routing. If NAT traversal is enabled (UDP encapsulation), account for the additional UDP header overhead.

Traffic patterns and metrics

Design traffic models that reflect real usage:

Bulk TCP throughput (iperf3) for maximum bandwidth testing with large TCP windows.
UDP streams at different packet sizes to measure per-packet processing overhead.
Concurrent flows to evaluate CPU scaling and context-switching impacts.
Short-lived connections to measure IKEv2 SA establishment cost and latency.

Essential metrics to capture:

Throughput (Mbps/Gbps) at various packet sizes.
CPU utilization per core and overall (user vs. system time).
Packets per second (PPS) for both encrypted and decrypted directions.
Latency under load (RTT and jitter).
Memory usage and per-SA state size.

Understanding CPU overhead sources in IKEv2/IPsec

CPU overhead arises from several key areas in an IKEv2/IPsec server:

IKEv2 control-plane operations: The Diffie-Hellman key exchange, signature verification, and certificate processing during SA establishment are CPU intensive but occur less frequently in long-lived deployments.
Cryptographic payload encryption/decryption (ESP): Symmetric cipher operations (AES, ChaCha20) constitute the bulk of per-packet work. Hardware acceleration (AES-NI) or optimized libraries can massively reduce cost.
Integrity and AEAD overhead: Algorithms like AES-GCM combine encryption and authentication, while AES-CBC requires separate HMAC processing—affecting CPU differently.
Packet processing overhead: Header parsing, routing, NAT traversal encapsulation (ESP-in-UDP), and XFRM transformations add work beyond cryptography.
Context switching and locking: User-space daemon interactions with kernel (via PF_KEY, Netlink, or rtnetlink), and shared data structures incur locking and context switches which affect throughput at scale.

Per-packet cost vs. per-byte cost

It’s important to distinguish per-packet versus per-byte costs. Small UDP packets generate high PPS rates, stressing the packet processing pipeline, whereas large TCP segments emphasize raw cryptographic throughput (bytes/sec). A server might achieve high Gbps for large packets but be limited by PPS when handling many small packets (VoIP, DNS, gaming).

Crypto choices and their performance implications

Choosing cryptographic algorithms has a major impact on throughput:

AES-GCM (AEAD): Provides excellent throughput on processors with AES-NI and GHASH acceleration. It reduces the need for separate HMAC and often yields the best throughput for large flows.
ChaCha20-Poly1305: Strong performance on CPUs lacking AES hardware acceleration, especially for mobile and lower-power instances. It can outperform AES on older Intel/AMD CPUs without AES-NI.
AES-CBC + HMAC: Generally slower and more CPU intensive because of separate hashing steps; avoid unless compatibility requires it.
Diffie-Hellman groups: ECDHE groups (e.g., Curve25519, P-256) reduce CPU cost of key exchange compared to classic Modular Exponentiation (MODP) groups like 2048-bit DH.

Rule of thumb: Use AES-GCM with AES-NI or ChaCha20-Poly1305 depending on hardware profile; prefer ECDHE for IKEv2 exchanges to minimize control-plane CPU cost.

Benchmark results: expected patterns and sample observations

While results depend heavily on hardware and software versions, the following patterns are commonly observed in benchmarking:

With AES-NI enabled and kernel offload (XFRM), a modern 8-core server can sustain multiple Gbps of IPsec traffic with per-core utilization between 30–60% at full throughput for large packets.
On the same hardware without AES-NI, AES-CBC throughput drops significantly, and ChaCha20-Poly1305 often performs better, particularly at small to medium packet sizes.
PPS-bound scenarios: When packet sizes are small (64–256 bytes), the bottleneck shifts to packet processing—the kernel’s networking stack, interrupt handling, and NIC limitations (RSS queues). CPU utilization per Gbps increases sharply.
IKEv2 rekey frequency matters: frequent rekeys spike CPU usage due to the expensive ECDHE and signature verification operations. Avoid short lifetimes in high-load environments.

Tuning recommendations for maximizing throughput

To achieve optimal performance, combine crypto selection with OS and network tuning:

Enable hardware crypto: Ensure AES-NI is enabled in the kernel and userspace crypto libraries (OpenSSL, Libgcrypt). Use perf or /proc/crypto to verify.
Move datapath to kernel where appropriate: Linux XFRM with kernel crypto can reduce context switching. However, userland implementations with AF_XDP or eBPF-based fast-paths may outperform in certain scenarios.
Optimize NIC settings: Use multiple RSS queues, set appropriate interrupt affinities, enable GRO/LRO cautiously (may interact with IPsec processing).
Adjust MTU and MSS: Avoid fragmentation by tuning MTU; encapsulating ESP in UDP (NAT-T) reduces MTU and can lower throughput unless PMTU is correctly set.
Tune Linux network buffers: Increase net.core.rmem_max and net.core.wmem_max and TCP window scaling for high-latency links.
Parallelize tunnels and flows: Distribute SAs across cores using policy-based steering or multiple SPIs to avoid hotspotting a single core.

Measuring and interpreting the results

Collecting accurate measurements requires correlating multiple data sources:

System-level metrics (top, mpstat, vmstat) for CPU and memory.
Network metrics (ifstat, ethtool -S) and NIC ring statistics.
VPN software counters (strongSwan logs, ip xfrm state counters) to confirm SA activity.
Packet captures for protocol-level validation and to measure overhead per packet.

Interpret results in context: a lower raw throughput with lower CPU usage can be preferable if it leads to better latency or lower operational cost. Consider the service-level needs: bursty interactive traffic demands low-latency and PPS capability, while bulk transfers require high symmetric bandwidth.

Conclusion and operational checklist

Benchmarking IKEv2 VPN servers for CPU overhead and bandwidth throughput is essential for delivering predictable, cost-effective VPN services. Key takeaways:

Measure both per-packet and per-byte performance across realistic traffic models.
Prefer AES-GCM with AES-NI or ChaCha20-Poly1305 based on hardware characteristics.
Minimize control-plane churn by using ECDHE and reasonable SA lifetimes.
Tune kernel and NIC for PPS-heavy workloads and ensure PMTU/MTU are configured correctly to avoid fragmentation.
Validate results with end-to-end captures and correlate CPU, PPS, and throughput metrics to pinpoint bottlenecks.

For further operational guidance and deployment tips tailored to Dedicated-IP-VPN style services, including examples of strongSwan configurations and kernel tuning recipes, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.