Trojan VPN on Cloud VPS: Real-World Performance Benchmarks

Deploying a TLS-based proxy like Trojan on a Cloud VPS is an increasingly common approach for sites, enterprises, and developers seeking a private, low-latency tunnel with strong resistance to DPI (deep packet inspection). This article presents real-world performance benchmarks and engineering insights for running Trojan on Cloud VPS instances. It focuses on measurable metrics, practical configuration tips, and trade-offs that matter to system administrators and application architects.

Why test Trojan on Cloud VPS?

Trojan is designed to mimic regular TLS traffic while delivering a SOCKS-like proxy experience. When combined with a Cloud VPS, it offers a flexible way to provide dedicated outbound IPs, custom routing, and privacy controls. However, the real-world performance of Trojan depends heavily on the underlying VPS resources, network path, and TLS configuration. Benchmarks help answer questions like:

What throughput can I expect for single- and multi-stream workloads?
How does TLS overhead affect latency-sensitive apps (VoIP, gaming)?
Which kernel and network optimizations deliver the best gains?
How many concurrent connections can a given VPS handle before CPU becomes the bottleneck?

Test environment and methodology

To produce meaningful, reproducible results we standardized the test harness across multiple VPS providers and instance sizes. The main criteria were:

Consistency: same OS image (Ubuntu 22.04 LTS), same kernel (5.15/6.x where supported).
Representative instance sizes: small (1vCPU/1GB), medium (2vCPU/4GB), large (4vCPU/8GB).
Multiple regions: North America (us-east), Europe (frankfurt), East Asia (tokyo).
Network: public IPv4 with advertised bandwidth tiers; tests performed on best-effort public network paths.

Software stack

Trojan server: trojan-go (latest stable) and trojan-core for comparative runs.
TLS: OpenSSL 1.1.1 / 3.0 with ECDHE-ECDSA and ECDHE-RSA profiles; Let’s Encrypt certificates for validity realism.
Benchmarks: iperf3 (TCP), speedtest-cli (multi-threaded TCP/UDP emulation), curl/wget for single stream HTTP(s), and torrent piece-download simulations for small-chunk concurrency.
Monitoring: atop, htop, vmstat, and ss for socket stats; tcpdump for packet-level verification.

Configuration details

Accurate benchmarking requires careful configuration. Key settings we standardized:

Trojan config: trojan-go with multiplexing disabled/enabled in separate runs; TLS ALPN set to “http/1.1” to match typical web clients.
ciphersuites: prioritized ECDHE+AES-GCM and CHACHA20-POLY1305 where available; RSA-only chains were avoided because of CPU overhead.
TCP stack tuning: net.core.rmem_max = 12582912, net.core.wmem_max = 12582912, net.ipv4.tcp_congestion_control = bbr (where supported), net.ipv4.tcp_mtu_probing = 1.
MTU: left at default 1500 except where path MTU issues were detected; in some cases lowering MTU to 1400 reduced fragmentation on long paths.

Metrics collected

We focused on the following measurable metrics and why each matters:

Throughput (Mbps): sustained TCP throughput for single and multiple streams.
Latency (ms): RTT impact introduced by the proxy and TLS handshake.
CPU usage (%): TLS/crypto cost and multiplexing overhead across vCPUs.
Connections per second / concurrent connections: proxy capacity and limits.
TLS handshake time: full handshake vs. session resumption or TLS 1.3 0-RTT behavior.

Key benchmark results (summary)

Below are summarized, representative results observed across instance sizes and regions. Absolute numbers vary by provider and network path, but the trends are consistent.

Throughput

Small instance (1vCPU/1GB): single-stream TCP via Trojan typically peaked at 70–150 Mbps depending on host and region. Multi-stream (4 parallel streams) could push to 180–250 Mbps before CPU crypto overhead limited gains.
Medium instance (2vCPU/4GB): single-stream sustained at 200–450 Mbps. With 4–8 parallel streams, sustained throughput commonly reached 500–700 Mbps.
Large instance (4vCPU/8GB): approaching VPS provider advertised egress limits. Single stream maxed at ~600–900 Mbps when the path and peering permitted; multi-stream load reached 1–3 Gbps on instances with generous network quotas.

Latency

Adding Trojan typically introduced an RTT increase of 5–40 ms, heavily dependent on region and whether the trojan server was in the same region as the client. Local region deployments saw the lower end of this range.
TLS handshake added ~50–120 ms for a cold handshake on intercontinental paths; however TLS 1.3 and session resumption reduced repeated connection latency substantially.

CPU and crypto overhead

On small instances, CPU saturated on one core at ~200 Mbps single-stream when using AES-GCM RSA/ECDHE with OpenSSL software crypto. Using EC keys and AES-NI capable CPUs reduced overhead.
Trojan performance scaled with vCPUs where the network stack and trojan-go allowed multi-threaded accept/worker patterns. However, single-flow throughput remained bound by a core’s crypto processing ability.

Concurrent connections

Small instance reliably handled several thousand short-lived connections per second, but steady-state concurrent sockets beyond ~10k required increasing ulimit and tuning file descriptors.
Large instances with tuned file-descriptors and adequate memory handled 50k+ concurrent connections in our synthetic tests before swapping or CPU context-switch overhead became dominant.

Interpreting these results

Several consistent observations emerged:

Single-stream vs multi-stream: TCP single-stream performance is often limited by per-core crypto. Multi-stream parallelism helps saturate available egress bandwidth but increases complexity for latency-sensitive apps.
TLS settings matter: TLS 1.3 plus ECDHE + CHACHA20-POLY1305 reduced CPU usage on non-AES-NI CPUs. On Intel/AMD servers with AES-NI, AES-GCM with hardware acceleration outperformed CHACHA20.
Multiplexing trade-offs: Trojan-go’s multiplex mode reduces handshake overhead and connection latency for many short-lived flows but can increase head-of-line blocking risk for large transfers unless worker threads and buffer sizes are tuned.
Network stack tuning yields measurable gains: enabling BBR, increasing socket buffers, and MTU probing improved throughput on high-latency paths.

Real-world application scenarios

How these metrics translate into typical workloads:

Web browsing and APIs

Browsers open many short-lived connections. Enabling multiplexing and TLS session resumption reduces page load times. With properly tuned socket buffers and TLS session caches, a medium VPS is sufficient to support dozens of concurrent users with snappy UX.

File downloads and streaming

Large single-stream downloads are sensitive to single-core crypto throughput. For high-bandwidth streaming (4K), prefer larger instances or multiple parallel HTTP range requests. When serving enterprise teams, consider offloading TLS to a proxy appliance or choosing CPU instances with AES-NI.

P2P and torrents

These workloads create many concurrent connections with small transfers. Increase file-descriptor limits, enable multiplexing cautiously, and monitor CPU for syscall overhead. Medium instances can comfortably support moderate P2P activity after tuning.

Best practices and optimization checklist

Choose CPUs with AES-NI for AES-GCM performance; use CHACHA20 on mobile or older CPUs without AES acceleration.
Prefer TLS 1.3 with ephemeral ECDHE keys to reduce handshake round trips and CPU per handshake.
Enable kernel optimizations: BBR congestion control, larger rmem/wmem, and aggressive reuse settings for busy proxies.
Tune trojan-go worker threads and accept loop behavior to match vCPU count.
Monitor for head-of-line blocking when enabling multiplex; consider per-client QoS limits or stream prioritization where necessary.
Use session tickets or TLS session resumption with a secure ticket key rotation policy to lower cold-start latency.

Limitations and potential pitfalls

Benchmarks are sensitive to external factors:

Peering and provider egress limits can cap throughput independent of VPS CPU. Always validate with provider-supplied network performance tests.
Shared neighbor noise in multi-tenant clouds can add jitter and reduce observed throughput.
Using software-only crypto on low-end VPS will produce misleadingly poor results — identify CPU features early in procurement.

Conclusion

Trojan on a Cloud VPS provides a flexible, privacy-preserving proxy solution with solid performance when appropriately configured. The dominating constraints are CPU crypto capability for single flows and overall egress bandwidth combined with VPS network policy. For most web and API proxying scenarios, a medium-tier VPS with proper TLS and kernel tuning delivers excellent performance and scalability. For high-bandwidth single-stream applications, choose instances with strong per-core performance or consider TLS offload solutions.

For additional deployment guides, sample trojan-go configurations and automated tuning scripts tailored to common VPS providers, see Dedicated-IP-VPN at https://dedicated-ip-vpn.com/. The site contains follow-up articles and downloadable benchmarks that mirror the setups described above.