Introduction
Deploying a modern VPN solution in production requires more than a single server — it needs automation, secure certificate management, traffic routing, observability, and the ability to scale with demand. Kubernetes provides the primitives to build a resilient, scalable cluster for VPN services. In this article we walk through deploying a Trojan-compatible VPN (e.g., trojan-go) on Kubernetes with production-grade security, autoscaling, observability, and CI/CD-ready manifests.
Why Kubernetes for VPN
Kubernetes offers several advantages for hosting a VPN service:
- Elastic scaling with Horizontal Pod Autoscaler (HPA) to match client load.
- Service discovery and stable internal addressing for backend components (authentication, metrics collectors, sidecar proxies).
- Declarative infrastructure enabling reproducible deployments and GitOps workflows.
- Isolation and policy enforcement via NetworkPolicies and RBAC.
Architecture Overview
A recommended architecture for running Trojan on Kubernetes includes these components:
- Trojan frontend pods running trojan-go (or equivalent), exposed through a Kubernetes Service and an Ingress or LoadBalancer.
- TLS termination handled by trojan itself (preferred) or by an Ingress controller with passthrough when TLS is end-to-end.
- ConfigMap/Secret to manage trojan configuration and user credentials/ssl keys.
- Cert-manager for automated TLS certificates when using an Ingress controller or for issuing certs for trojan instances.
- Observability stack (Prometheus + Grafana) for metrics; Fluentd/Fluent Bit for logs.
- Autoscaling with HPA (CPU / custom metrics such as active connections).
- NetworkPolicies to restrict traffic to only required ports and peers.
Container Image and Configuration
Use a minimal, well-maintained image for trojan-go. Prefer to build your own image in CI to pin dependencies and security patches. Example Dockerfile steps (conceptually):
– base on Alpine or distroless
– copy trojan-go binary and configuration file
– set an unprivileged user and a non-root ENTRYPOINT
Store runtime configuration as a Kubernetes ConfigMap for plain configuration (e.g., routing rules) and as a Secret for private keys and credentials.
ConfigMap and Secret considerations
Keep these guidelines in mind:
- Place sensitive materials (private keys, passwords, UUIDs) in Kubernetes Secrets, not ConfigMaps.
- Encrypt Secrets at rest (enable EncryptionConfiguration in kube-apiserver).
- Mount Secrets as files rather than environment variables when possible to minimize exposure in process listings.
Deployment Manifest Essentials
A typical Deployment should include resource requests/limits, liveness/readiness probes, and an appropriate restart policy. Key settings to include:
- resources.requests to ensure proper scheduling.
- resources.limits to prevent noisy-neighbor effects.
- readinessProbe that verifies control plane readiness (e.g., bind port check or status endpoint).
- livenessProbe to recover from stuck processes.
- Use PodDisruptionBudget to maintain minimum available replicas during upgrades.
Example directive descriptions (to be implemented as YAML in your manifests):
– Deployment with replicas set to 3 for high availability.
– RollingUpdate strategy with maxSurge 1 and maxUnavailable 1 to minimize downtime.
Networking and Ingress
How you expose trojan matters for TLS and performance:
- If trojan handles TLS, expose pods via a Service of type LoadBalancer or NodePort and configure your cloud load balancer with TCP passthrough to preserve end-to-end TLS.
- If using an Ingress controller, ensure it supports TCP passthrough (NGINX Ingress supports stream for TCP/UDP passthrough) or use a dedicated TCP load balancer and let trojan terminate TLS.
- For high throughput, use a Service with ExternalTrafficPolicy: Local to preserve client IPs and reduce cross-node hops.
- Implement Kubernetes NetworkPolicies to allow only trusted network segments (e.g., management, monitoring) to access admin endpoints.
TLS and Certificate Management
Production deployments must ensure robust TLS management:
- Use cert-manager to automatically issue and renew certificates from ACME (Let’s Encrypt) or internal CAs.
- For trojan, deploy TLS certificates as Secrets and mount them into pods as files so the server can load them without intermediaries.
- Rotate private keys periodically and automate rotation via CI/CD hooks or cert-manager renewal events.
Scaling and Performance
Trojan is typically CPU- and network-bound. To scale correctly:
- Configure HPA based on CPU usage and, ideally, custom metrics such as active connections or bytes/sec. Install Prometheus Adapter to expose custom metrics for HPA.
- Tune kernel and container networking: increase net.core.somaxconn, tcp_max_syn_backlog, and adjust conntrack settings for high connection counts.
- Use hostNetworking only if latency is critical and you accept the security implications; otherwise prefer CNI with optimized dataplane (Calico, Cilium with eBPF).
- Set appropriate MTU for overlay networks to avoid fragmentation at high throughput.
Security Best Practices
Hardening steps for production-grade VPN:
- Run trojan containers as a non-root user and apply PodSecurityContext to drop capabilities.
- Enable RBAC and create least-privilege ServiceAccounts for pods that require API access (e.g., for metrics annotation).
- Use NetworkPolicies to restrict which pods can talk to the VPN pods (e.g., management cluster, logging).
- Enable Image provenance and enforce image scanning in your CI pipeline. Use signed images in registries when possible.
- Collect audit logs and monitor for suspicious patterns—automate alerts for abnormal connection spikes or new client keys.
Observability and Logging
Visibility is critical for troubleshooting and capacity planning:
- Expose trojan metrics using a Prometheus exporter or trojan-go’s built-in status endpoint, and scrape via Prometheus.
- Install Grafana dashboards for connection counts, traffic rates, and latency.
- Forward logs to a central system (e.g., Elasticsearch, Loki) using Fluent Bit and keep structured logs to ease search and retention policies.
- Set alerting (Prometheus Alertmanager) for high error rates, high CPU, or unexpected restarts.
High-Availability and Disaster Recovery
Plan for failure scenarios:
- Deploy trojan across multiple Availability Zones or nodes to avoid single-node failure.
- Use StatefulSets only if persistent identity is needed; otherwise Deployments with sticky sessions in the load balancer suffice.
- Back up Secrets and ConfigMaps (e.g., Velero) and test restoration procedures regularly.
CI/CD and GitOps
Adopt a GitOps workflow to ensure reproducible infrastructure changes:
- Store Kubernetes manifests or Helm charts in Git.
- Use Flux or Argo CD to continuously reconcile cluster state with Git.
- Automate image builds with a pipeline that includes linting, static analysis, and vulnerability scanning (Trivy/Clair).
- Run automated canary or blue/green deployments for configuration or image upgrades to limit blast radius.
Example Operational Checklist
Before promoting a trojan-based VPN cluster to production, validate the following:
- Secrets encrypted at rest and deployed via a secure pipeline.
- Metrics and logs accessible and retention policy configured.
- Autoscaling policies tuned to real traffic patterns.
- NetworkPolicies in place to block unnecessary lateral movement.
- Disaster recovery documented and tested (restore of keys, configuration, and cluster state).
- Rate limits and abuse-detection mechanisms to prevent resource exhaustion by clients.
Common Pitfalls and How to Avoid Them
Be aware of typical mistakes:
- Exposing management endpoints publicly — always restrict or disable them in production.
- Not monitoring connection counts — this leads to sudden overload without early warning.
- Using default or weak UUIDs/credentials — enforce strong, unique client credentials per user.
- Incorrect TLS termination placement — mixing TLS termination at LB and trojan can cause visibility blind spots; prefer one model and standardize it.
Summary
Deploying Trojan (trojan-go) on Kubernetes provides powerful advantages in terms of automation, scaling, and operational consistency. The key to success is a combination of:
- secure certificate and secret management,
- well-defined resource limits and autoscaling,
- robust observability,
- and strict network and RBAC policies.
With these pieces in place and a CI/CD-driven delivery process, you can run a secure, scalable VPN cluster suitable for enterprise and high-traffic applications.
For practical manifests, integration examples (cert-manager, Prometheus Adapter for HPA), and prescriptive templates you can adapt to your cloud provider, consult the resources and guides available on Dedicated-IP-VPN: https://dedicated-ip-vpn.com/.
Dedicated-IP-VPN — https://dedicated-ip-vpn.com/