WireGuard VPN Throughput on Cloud VMs — Real-World Benchmark & Performance Insights

WireGuard has rapidly become the VPN of choice for developers, site operators, and cloud architects due to its minimal codebase, modern cryptography, and ease of configuration. However, when deploying WireGuard on cloud virtual machines (VMs), real-world throughput depends on many moving parts: instance CPU horsepower, kernel implementation, network offloads, MTU, IRQ/RSS behavior and the benchmarking methodology itself. This article walks through practical benchmarks, explains the technical levers that affect performance, and provides actionable tuning guidance for achieving optimal throughput on cloud VMs.

Why WireGuard Throughput Varies on Cloud VMs

WireGuard is designed to be lightweight and fast, but VPN throughput is not purely a function of the protocol. In cloud environments, the following factors determine real-world throughput:

VM CPU architecture and available vCPUs — cryptographic operations are CPU-bound. The number of cores and their single-thread performance matter.
Kernel implementation — modern Linux kernels include an in-kernel WireGuard implementation (post-5.6 upstream merge). Kernel-level code avoids user-kernel copy overhead.
NIC and hypervisor limits — cloud NIC bandwidth caps and the host-side network stack influence peak rates.
Networking offloads and interrupt distribution — GRO/GSO, TSO, and RSS distribute work across CPUs and change packet processing cost.
MTU and encapsulation overhead — WireGuard encapsulates traffic in UDP; MTU reduces useful payload per packet.
Test methodology — single-stream TCP vs. multi-stream TCP/UDP produce very different ceilings.

Benchmark Methodology (Recommended)

To generate repeatable numbers you should follow a clear methodology. The following approach was used in our tests and is recommended for your own measurements:

Use iperf3 for throughput tests. Run both TCP and UDP tests to understand behavior under different transport properties.
Measure both single-stream (iperf3 -P 1) and multi-stream (iperf3 -P 8 or -P 16) scenarios. Many cloud NICs and stacks benefit from parallel flows.
Test with realistic MTUs. Start with default, then tune to 1420–1428 for typical Ethernet+UDP encapsulation to avoid fragmentation through tunnels.
Measure host baseline (no WireGuard) first to find the NIC/VM upper bound, then enable WireGuard and compare.
Record CPU utilization on both ends, kernel versions, and ethtool settings. Repeat tests with and without offloads (GRO/GSO) to see impact.

Representative Results and Observations

Exact numbers vary by provider and instance type; the important takeaway are patterns you can expect:

Small instance types (1–2 vCPU): WireGuard throughput is typically CPU-limited. Expect a few hundred Mbps to ~1 Gbps depending on single-core performance.
Medium instances (4 vCPU+): Properly tuned, multi-core VMs frequently achieve multiple Gbps of WireGuard throughput. Parallel flows scale well because network and softirq work can be distributed.
Compute-optimized instances with high clock speed: Tend to achieve the highest single-stream and aggregated throughput due to strong per-core crypto throughput.
Large instances with 10 Gbps host NICs: WireGuard can approach the host NIC limit (8–10 Gbps) on recent kernels and with offloads enabled — provided the VM has enough vCPUs to handle interrupts and cryptographic work.

In our lab-style tests, a modern 4–8 vCPU VM on a 10 Gbps-backed host routinely achieved multiple Gbps for multi-stream TCP, and single-stream TCP often benefitted from higher single-core clocks. Lower-end VMs without AES or high single-core frequency saturated earlier. These patterns are consistent across common cloud providers (AWS, GCP, Azure) because the constraints are mostly per-VM CPU + hypervisor NIC limits.

Why single-stream vs multi-stream differ

Single TCP stream performance exposes single-thread/CPU limitations. WireGuard’s packet processing for a given flow tends to execute on the CPU handling that flow’s softirq/interrupt. If that CPU lacks cycles, a single flow stalls. Multiple parallel flows allow the kernel’s network stack and NIC interrupt distribution (RSS/RPS) to use multiple CPUs, increasing aggregate throughput.

Technical Tuning Tips to Improve WireGuard Throughput

Below are practical knobs that significantly affect WireGuard performance on cloud VMs. Apply incrementally and measure impact.

1) Use a modern kernel and in-kernel WireGuard

Prefer kernel 5.6+ (or backported WireGuard modules) so you run the in-kernel implementation. In-kernel WireGuard avoids extra copies and context switches present in userland implementations, reducing per-packet overhead.

2) Right-size CPU and use IRQ/softirq affinity

Assign enough vCPUs. If you expect multi-Gbps, 4+ vCPUs are frequently required.
Enable irqbalance or manually set IRQ affinity so NIC queues map to different CPUs.
Adjust RPS (Receive Packet Steering) and XPS to spread processing across CPUs: echo mask > /sys/class/net/eth0/queues/rx-*/rps_cpus

3) Leverage NIC and kernel offloads

Hardware and software offloads such as GRO/GSO/TSO can reduce CPU overhead. Use ethtool to inspect and toggle features. In many tests, leaving offloads enabled increases throughput — but verify, because some cloud virtual NICs behave differently. Typical commands:

ethtool -k eth0
ethtool -K eth0 gso on gro on tso on

4) Tune MTU carefully

WireGuard encapsulates IP packets in UDP; default MTU might lead to fragmentation. Use an MTU roughly equal to host MTU minus UDP+WireGuard overhead. Common values are 1420–1428. Test for PMTU behavior end-to-end. Command: ip link set dev wg0 mtu 1420

5) Increase socket and network buffers

Raise kernel buffer sizes to accommodate high-throughput bursts:

sysctl -w net.core.rmem_max=268435456
sysctl -w net.core.wmem_max=268435456
sysctl -w net.core.netdev_max_backlog=300000
sysctl -w net.ipv4.udp_mem=”1048576 2097152 4194304″

6) Use parallel flows for max aggregate throughput

iperf3 -P 8 (or 16) shows aggregate capacity. For load-balancing real applications, ensure multiple concurrent streams can be distributed across CPUs using RSS/RPS.

7) Monitor CPU crypto acceleration

WireGuard uses ChaCha20-Poly1305 by default, which is fast across modern CPUs, and benefits more from vector and instruction-level optimizations than from AES-NI on some platforms. Ensure kernel crypto primitives are available and check performance counters if you can (perf).

Common Pitfalls and How to Avoid Them

Assuming WireGuard will always saturate NIC: If single-core CPU is a bottleneck, you won’t hit NIC limits. Check CPU utilization and consider larger/more vCPUs.
Misconfigured MTU causing fragmentation: Fragmentation adds CPU overhead and latency. Tune MTU for encapsulation overhead and test for PMTU issues.
Turning off offloads blindly: Disabling GRO/GSO can reduce throughput because it forces the NIC/stack to handle many more packets. Test both states.
Benchmarking only with single-threaded tools: Single-stream iperf3 doesn’t show aggregate capacity. Use parallel streams to simulate real traffic patterns.

Advanced Options: eBPF, XDP, and Beyond

For high-performance use-cases, advanced techniques such as eBPF/XDP acceleration or running WireGuard in userland with DPDK are options. These approaches bypass parts of the kernel networking path and can yield sub-ms latency and higher throughput in specialized environments. However, they add complexity and are typically unnecessary for most cloud VM deployments. For the majority of site operators and cloud users, kernel WireGuard with correct tuning provides the best balance of performance, maintainability and security.

Practical Checklist Before Going to Production

Confirm kernel WireGuard is in use (avoid outdated wireguard-go unless necessary).
Run baseline host-to-host network tests without WireGuard to identify NIC limits.
Measure single-stream and multi-stream WireGuard performance and observe CPU distribution.
Tune MTU, offloads and socket buffers based on measurements.
Enable monitoring: CPU, queue depths, and packet drops. Track jitter and latency for real-time apps.
Repeat tests across expected cloud instance types to find the best price-to-performance option.

Bottom line: WireGuard on cloud VMs delivers excellent performance, but reaching the upper bounds requires a holistic approach: choose the right VM size, keep the kernel and WireGuard up to date, tune MTU and offloads, and benchmark with multi-stream traffic. For many deployments, following the tuning steps above will move throughput from a few hundred Mbps to multiple Gbps on appropriately provisioned cloud instances.

For more practical guides and hands-on configuration examples, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.