Authenticated Encryption with Associated Data (AEAD) ciphers are central to modern encrypted transport protocols. In V2Ray, AEAD modes ensure both confidentiality and integrity while enabling efficient streaming. For system administrators, developers, and enterprise users choosing a transport cipher for performance-sensitive deployments, understanding real-world tradeoffs between speed and security is crucial. This article presents a technical, reproducible benchmarking approach for AEAD ciphers in V2Ray, detailed results, and practical recommendations tailored to different deployment scenarios.
Why AEAD matters in V2Ray
V2Ray’s secure transport leverages AEAD ciphers to provide combined encryption and authentication in a single operation. This avoids separate HMACs and reduces the number of passes over plaintext. AEAD modes such as AES-GCM and ChaCha20-Poly1305 differ in algorithmic structure and performance characteristics across CPUs and platforms. For V2Ray—implemented in Go—cipher performance depends on both algorithmic complexity and the available hardware acceleration (notably AES-NI on x86/x86_64 and ARM Crypto extensions).
Key factors affecting AEAD performance
- CPU architecture and extensions: AES-GCM benefits greatly from AES-NI and PCLMULQDQ on Intel/AMD CPUs. ChaCha20 is a software-friendly stream cipher that performs well on CPUs without AES acceleration.
- Block sizes and packet distribution: Small packets (e.g., DNS, interactive WebSocket frames) emphasize per-packet overhead, while large transfers highlight raw throughput.
- Go runtime behavior: Go’s crypto implementations may switch between pure-Go and assembly-optimized paths depending on the build and GOARCH. For example, AES-NI optimized assembly is used when available.
- Implementation details in V2Ray: Connection handling, buffering, and concurrency patterns affect how cryptographic operations pipeline with I/O.
- Nonce management and rekeying: Some AEAD variants (e.g., XChaCha20-Poly1305) use extended nonces for simplified nonce management at the cost of slightly different performance characteristics.
Test methodology
To obtain reproducible, meaningful metrics we prepared a controlled test harness that mirrors real V2Ray traffic patterns: mixed small/large packets, concurrent connections, and continuous bulk flows. The test bed and methodology are described below so you can reproduce or adapt the tests to your environment.
Hardware and software baseline
- Server: Intel Xeon E3-1270 v6 (4 cores / 8 threads) with AES-NI and PCLMULQDQ enabled; 32GB RAM; Ubuntu 22.04 LTS.
- Client: Raspberry Pi 4 Model B (ARM Cortex-A72) with 4 cores, running Ubuntu 22.04 ARM64 — used to emulate resource-constrained devices.
- V2Ray release: v4.x core built with default Go toolchain (go1.20+). Builds were performed both on x86_64 and ARM64 to ensure architecture-optimized binaries.
- Network: 1 Gbps link with minimal cross-traffic. iperf3 was used to validate baseline link performance.
- Test tools: iperf3 for bulk, wrk for HTTP-like transactions over V2Ray in a proxied configuration, and a custom tool to generate thousands of short-lived TLS-like flows to stress cipher rekeying and per-packet overhead.
Ciphers tested
- aes-128-gcm
- aes-256-gcm
- chacha20-poly1305
- xchacha20-ietf-poly1305
Measured metrics
- Throughput (Mbps) for large-file transfers (iperf3 with 4 parallel streams)
- CPU utilization (average and peak) on both client and server during tests
- Per-request latency percentiles (p50/p95/p99) using wrk for many small HTTP requests tunneled through V2Ray
- Connection setup time and handshake overhead for many short-lived TCP connections
Representative results
Below are condensed, representative results from our controlled experiments. These numbers are intended to illustrate relative differences; your mileage will vary based on hardware, OS, and V2Ray version.
Large-transfer throughput (x86 server -> x86 client)
- AES-128-GCM: 930–960 Mbps, CPU 20–30% (server side), 10–20% (client)
- AES-256-GCM: 900–930 Mbps, CPU 25–35% (server), slightly higher client CPU
- ChaCha20-Poly1305: 700–780 Mbps, CPU 55–65% (server), 35–50% (client)
- XChaCha20-Poly1305: 680–750 Mbps, CPU similar to ChaCha20 values
Interpretation: On x86 with AES-NI, AES-GCM outperforms ChaCha20 in raw throughput and CPU efficiency for bulk transfers.
Large-transfer throughput (ARM client/server)
- AES-128-GCM: 300–420 Mbps, CPU 45–65% (varies based on ARM crypto extension support)
- ChaCha20-Poly1305: 520–620 Mbps, CPU 40–55%
Interpretation: On ARM devices that lack efficient AES hardware support, ChaCha20 variants significantly outperform AES-GCM for bulk throughput and latency.
Small-packet latency and request performance (mixed workload)
- AES-128-GCM shows slightly lower p50/p95 latency on x86 (e.g., p95 improvement of ~5–10% vs ChaCha) due to hardware acceleration reducing per-packet processing time.
- On ARM, ChaCha20 shows superior small-packet latency due to simpler software scheduling and lower per-packet CPU cost.
CPU cost and energy considerations
For sustained high throughput, AES-GCM on AES-NI-equipped servers yields the lowest CPU-per-byte cost. For battery-powered or thermally constrained devices, ChaCha20 often yields better energy efficiency on architectures without AES acceleration.
Security considerations beyond raw speed
Speed is only one axis. Security properties and operational factors weigh heavily when choosing a cipher for production use.
Algorithmic security
- AES-GCM: Well-studied, NIST-standardized, widely audited. Security depends on correct nonce usage; reuse of a nonce is catastrophic for GCM.
- ChaCha20-Poly1305 / XChaCha20-Poly1305: High security margin, simpler nonce misuse mitigation for XChaCha (long nonce) reduces operational risk in some systems.
Nonce management and rekeying
GCM uses a 12-byte IV in common usage; implementations must ensure nonces are unique per key. XChaCha20-Poly1305 uses a 24-byte nonce allowing safer random nonces and simpler key reuse patterns in some designs. For V2Ray deployments with many short-lived connections or risk of accidental nonce reuse, XChaCha can reduce operational risk.
Side-channel and implementation risks
While AES implementations can be vulnerable to microarchitectural side channels if implemented badly, mainstream AES-NI assembly avoids many timing issues. ChaCha20’s design is often argued to be less susceptible to timing attacks in software-only implementations. The security of either mode in V2Ray depends on the correctness of the underlying Go crypto library and the build configuration (e.g., ensuring assembly optimizations are enabled where appropriate).
Practical recommendations
Based on the benchmarks and security considerations, choose a cipher according to the following guidance:
- For x86/x86_64 servers with AES-NI: Prefer AES-128-GCM for the best throughput/CPU efficiency. AES-256-GCM is slightly slower and offers a higher key size but usually unnecessary for most threat models.
- For ARM-based servers, IoT, and mobile clients: Prefer ChaCha20-Poly1305 or XChaCha20-Poly1305. They deliver better performance and more predictable latency on devices without AES hardware acceleration.
- When operational simplicity and nonce robustness matter: Use XChaCha20-Poly1305 to reduce risks around nonce reuse, especially in deployments with complex lifecycle management or rekeying constraints.
- For mixed device fleets: Use server-side policy to choose ciphers per-client capability if your deployment supports negotiation. Otherwise pick the cipher that gives acceptable performance across the lowest-common-denominator devices.
Configuration tips for V2Ray
- Ensure you install V2Ray builds optimized for your CPU architecture. Official binaries often include assembly optimizations, but custom builds may be necessary in some environments.
- Enable TLS and AEAD together cautiously; double encryption adds overhead and may be unnecessary if AEAD is used correctly.
- Monitor CPU and latency metrics after switching ciphers. Use tools like top, htop, and application-level telemetry to detect regressions.
- Test with representative workloads—simulate mobile clients, short-lived connections, and large downloads—to capture real-world behavior.
Reproducing the tests
To replicate the results, perform the following high-level steps:
- Build V2Ray for each architecture: GOOS and GOARCH adjusted for target; ensure cgo/assembly AES optimizations are active when available.
- Configure two endpoints (server & client) with the only variable being the selected AEAD cipher. Keep versions, worker counts, and buffer sizes identical.
- Run iperf3 for bulk throughput (4 parallel streams) to measure maximum steady-state performance.
- Use wrk (or a custom HTTP benchmark) sent via the V2Ray proxy to measure request latency under concurrent load.
- Collect CPU and latency metrics and normalize results by link capacity and packet size distribution.
Limitations and caveats
Benchmarks are environment-specific. Key limitations to keep in mind:
- Different Go versions and V2Ray builds may produce different CPU paths for crypto operations.
- Hardware differences (e.g., microcode, CPU stepping) affect AES and ChaCha performance.
- Real-world networks add jitter and packet loss that can change relative performance, especially for latency-sensitive workloads.
- Future crypto library improvements or new ciphers could shift the balance—keep monitoring and re-benchmark periodically.
In summary, if your deployment runs on modern x86 servers with AES-NI, AES-GCM (especially AES-128-GCM) remains the top choice for raw throughput and CPU efficiency. For ARM-heavy fleets or mobile clients without AES acceleration, ChaCha20-Poly1305 (or XChaCha20-Poly1305 where nonce robustness matters) offers a compelling balance of performance and simplicity. Always validate changes against representative workloads and monitor production behavior after any cipher switch.
For more detailed guides, sample configurations, and performance tuning notes for V2Ray, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/