High availability (HA) for Shadowsocks is not just about keeping a server online — it’s about ensuring ongoing user sessions survive backend failures, minimizing reconnection time, and maintaining predictable performance under load. For site owners, enterprises, and developers deploying Shadowsocks as a service, designing HA requires combining reliable failover for the network path (VIPs, routing), deterministic session persistence (so a client keeps hitting the same backend), and robust health-checking and monitoring. This article dives into the practical techniques and configuration examples you can use to achieve resilient, session-persistent Shadowsocks deployments.

Why session persistence matters for Shadowsocks

Shadowsocks is a lightweight SOCKS-like proxy that carries TCP and UDP traffic. Clients establish TCP sessions (and optionally UDP associations) to a server port and rely on the server to maintain per-connection state. If a client’s packets are suddenly routed to a different backend that has no connection state, the TCP stream will be broken and the client will need to reconnect. For interactive apps, long downloads, or tunnels, this disruption is unacceptable.

Therefore, session persistence — the guarantee that packets from a given logical session continue to be handled by the same backend — is essential for smooth failover and predictable user experience. There are several approaches to implement persistence at different layers (L3/L4/L7), each with trade-offs in complexity, latency, and reliability.

Architecture patterns for HA and session persistence

Below are common patterns to architect HA for Shadowsocks:

Virtual IP (VIP) + Keepalived (VRRP): Active-passive failover of a single IP. Easy to set up, provides instant switch of the service IP to a healthy node.
Load balancer with persistence (IPVS, HAProxy): Active-active frontends distribute traffic across backends while keeping client sticky mapping.
Anycast + overlay routing: Multiple geographically distributed endpoints announce the same IP — good for global scale but requires BGP control and complex session considerations.
Client-side multi-server/fallback: Client libraries maintain multiple server entries and reconnect to another server on failure — simplest, but causes interruption.

Most production deployments combine a VIP for failover and a load-balancing layer for scaling. Below we dive into concrete setups with configuration snippets and operational guidance.

Active-passive failover with Keepalived (VIP)

Keepalived uses VRRP to move a virtual IP between nodes quickly. This is ideal when you run one Shadowsocks instance per host and want fast failover with minimal configuration changes.

Example keepalived configuration (minimal):

<pre>
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass mysecret
}
virtual_ipaddress {
198.51.100.10/32
}
}
</pre>

Key operational notes:

Use a short advert_int (1s) for faster detection, but be mindful of flapping.
Combine with service-level health checks via keepalived scripts to demote nodes if Shadowsocks process is unhealthy.
Use ARP and gratuitous ARP tuning (arp_ignore/arp_announce) on Linux to reliably advertise the VIP.

Active-active load balancing with session persistence

Active-active setups require a load balancer that can ensure sticky mapping of client sessions to the same backend. Two solid open-source choices are IPVS (Linux Virtual Server / LVS) and HAProxy. Each provides persistence mechanisms:

IPVS (LVS) with persistence

IPVS operates at L4 and is efficient at scale. Persistence in IPVS can be enabled using the ‘persistent’ timeout which keeps the same NAT mapping for a client source IP for a configured duration.

Basic ipvsadm commands:

<pre>

create a service (TCP)

ipvsadm -A -t 198.51.100.10:8388 -s rr

add backends

ipvsadm -a -t 198.51.100.10:8388 -r 10.0.0.11:8388 -m
ipvsadm -a -t 198.51.100.10:8388 -r 10.0.0.12:8388 -m

set persistent timeout (seconds)

ipvsadm –set 300
</pre>

Notes:

Persistence in IPVS is typically based on source IP. If clients are behind NAT or many users share the same IP, persistence granularity can be coarse.
For UDP (Shadowsocks UDP relay), IPVS treats UDP statelessly — the persistent timeout helps but UDP failover may still drop in-flight packets.
Combine IPVS with health checks (keepalived + check scripts) to remove failed servers promptly.

HAProxy with stick-tables (recommended for finer control)

HAProxy can proxy TCP connections (TCP mode) and maintain sticky tables keyed by source IP or even an application-level identifier. It also supports active health checks and sophisticated failover behavior.

Example HAProxy TCP config snippet (relevant parts):

<pre>
global
log /dev/log local0

defaults
mode tcp
timeout connect 5s
timeout client 1m
timeout server 1m

frontend ss_frontend
bind 198.51.100.10:8388
default_backend ss_backends

backend ss_backends
balance roundrobin
stick-table type ip size 200k expire 10m
stick on src
server ss1 10.0.0.11:8388 check
server ss2 10.0.0.12:8388 check
</pre>

Advantages:

You can set long expiration on stick-tables to maintain mapping across long-lived sessions.
HAProxy supports per-backend weight adjustments, buffs, slowstart, and detailed health checks that remove unhealthy nodes without affecting others.
Transparent proxying modes are available if you need to preserve client IPs to the backend.

Advanced persistence techniques

Some environments require persistence beyond simple source-IP sticky behavior. Consider these methods:

Connection tracking marks (connmark / conntrack)

Marking packets at the kernel level using conntrack and iptables/xt_connmark lets you influence routing logic in the Linux kernel. For example, mark a connection on first packet and then keep directing marked packets to the same backend using IP rule routes.

Key components:

xt_connmark/conntrack to save and set marks for flows.
IP rule / ip route table to route marked packets to a particular backend or out a specific interface.
Requires careful tuning of conntrack timeouts for UDP and TCP.

State replication (advanced, complex)

True zero-downtime handover of established TCP sessions requires replicating connection state between backends — a complex task typically reserved for specialized appliances or kernel-level clustering (e.g., Keepalived + conntrackd in some architectures). conntrackd can synchronize connection tracking tables between nodes, enabling seamless transfer of active NAT sessions.

Concerns:

State replication increases complexity and bandwidth between nodes.
It must be secured and kept performant for large connection tables.

Health checks, detection, and graceful failover

Accurate health checks are the backbone of HA. For Shadowsocks, a simple TCP connect check to the server port is not enough — ensure your check verifies the Shadowsocks process can accept a Shadowsocks handshake or performs a lightweight proxy request.

Best practices:

Use application-level checks if possible: a small client that performs the shadowsocks handshake or verifies a known HTTP fetch via the proxy.
Configure aggressive but safe failure detection thresholds (e.g., fail on 3 consecutive failures, but avoid flapping by using a short backoff).
When a node is unhealthy, remove it from the load balancer quickly and allow existing connections to drain when possible (HAProxy supports drain mode).

Handling UDP and DNS behaviours

UDP traffic (DNS queries, streaming) is harder to persist because it’s connectionless. Strategies:

Use consistent hashing by source IP/port pair to route UDP to the same backend (some load balancers support this).
Increase UDP session timeouts so that short bursts stay mapped to the same server during transient issues.
Where possible, encourage use of TCP or multiplexing protocols that tolerate reconnects.

Monitoring, logging, and testing

Operational visibility is essential. Monitor server metrics (CPU, memory, socket counts), Shadowsocks logs, and network metrics. Tools and checks to include:

ipvsadm -L -n or HAProxy stats endpoint to check active sessions and backend health.
conntrack -L to inspect active NAT/conntrack entries.
tcpdump/suricata for traffic-level debugging during failover tests.
Automated failover testing: simulate backend crashes, network partitions, and measure reconnection time and packet loss.

Security and operational hardening

When adding load balancers and VIPs you increase the attack surface. Protect it:

Run Shadowsocks with modern ciphers like ChaCha20-Poly1305 or AEAD modes. Keep libs updated.
Harden server OS: disable unnecessary services, apply kernel hardening, and lock down management plane (SSH keys, firewall).
Limit health-check endpoints to trusted internal networks or protect them with simple auth to avoid abuse.
Encrypt replication and control-plane traffic (e.g., conntrackd, keepalived authentication).

Operational checklist for deployment

Decide architecture: VIP for simple HA or LB (IPVS/HAProxy) for scale and persistence.
Choose persistence granularity: source-IP stickiness, 5-tuple, or application token if available.
Implement robust health checks and automatic removal of unhealthy nodes.
Test failover scenarios and measure session disruption and reconnection behavior.
Deploy monitoring and alerts for connection drops, high conntrack counts, and backend saturation.
Document failover playbooks and maintenance windows for rolling updates.

Shadowsocks HA with session persistence is achievable with a thoughtful combination of VIPs, L4 load balancing, sticky mappings, and careful monitoring. For most site owners and enterprise operators, a combination of Keepalived for VIP failover and HAProxy/IPVS for load distribution with stickiness provides a balanced solution: minimal failover time, fine-grained control over persistence, and robust health-checking capabilities. For ultra-low disruption use-cases, explore state replication tools such as conntrackd, but be mindful of complexity and operational costs.

For more implementation guides, configuration templates, and managed HA patterns specific to proxy deployments, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.

Shadowsocks High Availability: Session Persistence and Failover Setup