Virtual Private Networks using the Layer 2 Tunneling Protocol (L2TP) remain widely deployed for remote access and site-to-site connectivity. However, many administrators and developers find that L2TP-based setups — commonly paired with IPsec for security — exhibit inconsistent throughput compared with newer protocols. This article provides a detailed, reproducible benchmark methodology and analyzes real-world performance factors affecting L2TP VPN speed. It also offers targeted optimization steps for webmasters, enterprise IT teams, and developers responsible for VPN deployment.
Why benchmark L2TP performance?
Benchmarks separate perception from reality. Vendors and documentation often report idealized throughput figures, but real-world performance depends on network conditions, packet sizes, CPU capabilities, crypto implementations, and OS network stacks. For teams relying on L2TP for critical services, knowing expected throughput, latency, and CPU utilization under realistic loads is essential for capacity planning, SLA definition, and troubleshooting.
High-level test plan
A meaningful benchmark requires a controlled, repeatable environment. The plan below was designed to reveal the principal bottlenecks and to be adaptable across different host hardware and cloud providers.
- Test types: TCP and UDP throughput (bulk transfer and many concurrent flows), latency (ICMP and TCP RTT), and packet loss resilience.
- Encryption modes: IPsec ESP using AES-CBC+HMAC-SHA1 (common legacy setup) and modern AEAD ciphers like AES-GCM (if supported).
- Client implementations: Windows built-in L2TP/IPsec, xl2tpd+strongSwan on Linux, and common mobile clients.
- Measurement tools: iperf3 (for TCP/UDP throughput), ping (latency and jitter), scp/rsync for real file transfers, and tcpdump/Wireshark for packet-level validation.
- Environment variables: baseline tests without VPN, then tests through L2TP/IPsec. Measure CPU usage on both endpoints and check MTU/MSS effects.
Testbed configuration
Use two dedicated hosts or VMs in the same datacenter region to minimize Internet variability when measuring the encryption overhead. Example baseline:
- Server A (VPN server): 4 vCPU (Intel Xeon with AES-NI), 8 GB RAM, Linux kernel 5.x, strongSwan 5.x + xl2tpd.
- Client B: 4 vCPU, same CPU generation if possible, Linux or Windows client.
- Network: 10 Gbps virtual NICs with direct routing, reduce external hops.
Record system load (top or sar), CPU frequency scaling state, and kernel offload settings (TSO, GSO, GRO). These can dramatically change observed throughput.
Key measurement metrics
Focus on these metrics to form a complete performance picture:
- Throughput (Mbps/Gbps): Measured using iperf3 for sustained TCP and UDP streams.
- Latency (ms): ICMP ping and TCP RTT for small and large packets.
- CPU utilization (%): At both endpoints, to identify crypto-bound vs. I/O-bound limits.
- Packet loss and jitter: Especially for UDP and interactive workloads.
- Fragmentation count: Number of fragmented packets due to MTU mismatches.
Why L2TP can be slower than expected
L2TP itself provides tunneling but no encryption; in practice it’s paired with IPsec ESP. Combined encapsulation adds overhead on packet size, and cryptographic processing consumes CPU. Below are common reasons for reduced throughput:
- Encapsulation overhead: L2TP adds headers and IPsec (ESP) adds additional bytes (IVs, padding, ESP headers). This reduces the effective MTU for inner payloads and can trigger fragmentation if not tuned.
- CPU-bound crypto: Without AES-NI or hardware crypto offload, CPU can become the primary bottleneck, particularly for small-packet, high-packet-rate workloads.
- Implementation inefficiencies: Userland implementations (xl2tpd + strongSwan) may incur context switches; kernel-mode implementations can be faster.
- MTU/MSS mismatches: Default MTU values often cause fragmentation, which significantly reduces throughput due to reassembly overhead and additional packet processing.
- NAT traversal (NAT-T): Using UDP encapsulation for IPsec to traverse NAT typically adds another UDP header, increasing overhead and sometimes forcing suboptimal paths.
Quantifying encapsulation overhead
Approximate extra bytes per packet when using L2TP over IPsec (ESP with AES-CBC+HMAC):
- Ethernet/IP header: 14 + 20 = 34 bytes (IPv4)
- ESP header + IV + padding + ICV: ~32–56 bytes depending on cipher/auth
- L2TP header: 6–8 bytes (control/data)
- UDP encapsulation for NAT-T: +8 bytes
Combined, expect an overhead of roughly 60–100 bytes per packet. For small payloads this is proportionally large and can reduce effective throughput and increase CPU cost per byte.
Representative benchmark results (illustrative)
Below are generic results observed under controlled conditions (4 vCPU hosts with AES-NI enabled). These are illustrative — actual numbers will vary with hardware and configuration.
- Baseline (no VPN): iperf3 TCP single stream ~9.2 Gbps (NIC-limited).
- L2TP/IPsec (AES-GCM, kernel offload): TCP single stream ~3.0–4.5 Gbps, CPU 30–60% (per host).
- L2TP/IPsec (AES-CBC+HMAC, no AES-NI): TCP single stream ~400–800 Mbps, CPU 70–100% (crypto-bound).
- Multiple parallel TCP streams (16 flows): aggregate throughput closer to line rate when AES-NI + offload are present; without AES-NI the aggregate is limited by CPU.
- UDP test (iperf3) shows lower packet loss tolerance; packet loss >1% causes throughput collapse depending on client congestion control.
These synthetic results highlight a common pattern: modern AEAD ciphers and hardware acceleration significantly improve throughput, while legacy cipher suites and userland processing impose steep limits.
Optimization recommendations
To maximize L2TP performance, evaluate the following optimizations in prioritized order:
- Enable AES-NI and use AEAD ciphers: AES-GCM reduces per-packet overhead by combining encryption and authentication. Verify CPU supports AES-NI and open-source crypto libraries are using it.
- Prefer kernel-mode IPsec: Kernel implementations (Linux’s native XFRM stack) usually outperform userland daemons for throughput and lower context-switch overhead.
- MTU and MSS tuning: Reduce tunnel MTU (e.g., to 1400 or 1420) and use MSS clamping on firewall (iptables/ufw) to avoid fragmentation.
- Enable NIC offloads: TSO/GSO/GRO and checksum offloading reduce CPU overhead; validate with ethtool and tune carefully, especially in virtualized environments.
- Use appropriate IKE settings: Use modern key exchange (IKEv2 if supported) and shorter lifetimes for rekeying only when necessary. Optimize DH group selection balancing security vs CPU cost (e.g., use ECP groups with hardware support).
- Monitor and scale CPU: If CPU-bound, scale vertically (faster cores) or horizontally (more VPN gateways/load balancers). For high-concurrency multi-tenant setups, distribute sessions across multiple servers.
- Avoid double encapsulation: Where possible, avoid wrapping IP-in-IP within other tunnels; reduce unnecessary layers to lower overhead and fragmentation risk.
Practical tuning checklist
- Check /proc/crypto and lscpu for AES-NI visibility.
- Run iperf3 with different parallelism (-P) to discover CPU vs. network bottlenecks.
- Use tcpdump to confirm packet sizes and count fragmentation.
- Enable kernel crypto modules and prefer hardware acceleration on gateways.
- Set iptables -t mangle rules for TCP MSS clamping: –clamp-mss-to-pmtu.
Testing caveats and interpretation
Benchmarks must be interpreted with context. Single-flow TCP tests often under-report aggregate capability because TCP’s congestion control may underutilize available capacity across multiple cores. Conversely, aggregated parallel flows can mask per-session performance limitations. Always complement iperf3 data with real application tests (file transfers, database replication, VoIP streams) to assess user-facing impact.
Also consider environmental variability: cloud VMs may have noisy neighbors, host CPU stealing, or virtualized NIC performance quirks. Run multiple iterations at different times and present median values to avoid outliers skewing results.
Conclusions and deployment guidance
L2TP combined with IPsec remains a viable option for many deployments, but its performance is sensitive to cryptographic choices, CPU acceleration, and network stack tuning. In practice:
- If high throughput is required, use AEAD ciphers (AES-GCM), ensure hardware AES support, and prefer kernel-space IPsec stacks.
- For small offices or mobile users where throughput demands are modest, default configurations may be sufficient, but watch MTU and NAT-T impacts.
- When deploying at scale, instrument VPN gateways for CPU, packet rates, and per-session throughput; automate horizontal scaling or session distribution as needed.
By following the structured benchmarking approach above — combining iperf3, packet captures, and application-level tests — administrators can quantify real-world limits and make informed decisions about cipher suites, hardware upgrades, or moving to alternative VPN technologies (WireGuard, WireGuard-based solutions, or TLS-based VPNs) when appropriate.
For further resources and detailed deployment examples tailored to dedicated IP VPN use cases, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.