Optimize SSTP VPN: Configure Timeouts and Keepalive for Reliable Connections

Secure Socket Tunneling Protocol (SSTP) delivers VPN over TLS (TCP/443), making it particularly suitable for traversing restrictive networks and firewalls. However, because SSTP encapsulates PPP over TCP, it is sensitive to connection idle timeouts, NAT timeouts, and the interaction between TCP retransmission and VPN-layer keepalive mechanisms. This article outlines practical, platform-agnostic and platform-specific strategies to configure timeouts and keepalives for reliable SSTP connections in production environments.

Why timeouts and keepalive matter for SSTP

Two core reasons SSTP connections can drop unexpectedly are:

Network element idle timeouts: Load balancers, NAT gateways, and firewall state tables often remove TCP flows that appear idle for a period (e.g., 4 minutes on some cloud LBs).
Layer interactions (TCP-over-TCP): SSTP runs PPP/TCP on top of TLS/TCP. When application-level probes or retransmissions occur, they can be conflated with underlying TCP behavior, lengthening recovery time or triggering disconnections.

Properly configuring keepalive and timeout values on clients, servers, and intermediate network devices reduces false disconnects, speeds recovery, and preserves user experience for remote workers and services.

Three-layer approach to reliability

Treat reliability in three layers:

Application/PPP-layer keepalives (LCP/PPP echo)
Transport-layer keepalives (TCP keepalive settings)
Network-element timeout configuration (NAT, firewalls, load balancers)

Combined, these controls ensure the connection remains visible and recoverable across the whole path.

PPP/ LCP keepalive (inside SSTP)

Because SSTP carries PPP, configure PPP link monitoring where possible. For Unix/Linux SSTP server implementations that use pppd or similar, enable LCP echo options:

lcp-echo-interval: how often to send a probe (seconds)
lcp-echo-failure: number of unanswered probes before considering the link dead

Example pppd snippet:

lcp-echo-interval 30 and lcp-echo-failure 4 — this probes every 30 seconds and declares the link down after 4 missed replies (approximately 2 minutes).

Recommendation: Start with 20–60s for interval and 3–6 for failure. Shorter intervals make faster detection but cost more packets; longer intervals reduce overhead but delay failure detection.

TCP keepalive tuning

Because SSTP uses TLS over TCP, adjusting OS-level TCP keepalive settings can keep the TCP session alive across NAT and middleboxes without interfering with PPP behavior.

Common settings (Linux):

net.ipv4.tcp_keepalive_time — seconds of inactivity before starting probes (default 7200)
net.ipv4.tcp_keepalive_intvl — interval between individual probes (default 75)
net.ipv4.tcp_keepalive_probes — number of probes before declaring the peer dead (default 9)

Practical example values to improve SSTP stability:

net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

Apply with sysctl:

sysctl -w net.ipv4.tcp_keepalive_time=120

sysctl -w net.ipv4.tcp_keepalive_intvl=15

sysctl -w net.ipv4.tcp_keepalive_probes=5

And persist in /etc/sysctl.conf or /etc/sysctl.d/.

Windows equivalents live in the registry under HKLMSYSTEMCurrentControlSetServicesTcpipParameters:

KeepAliveTime (DWORD, milliseconds) — default 7,200,000 (2 hours)
KeepAliveInterval (DWORD, milliseconds) — default 1,000
TcpMaxDataRetransmissions (DWORD) — controls retransmission attempts

Example: set KeepAliveTime to 120000 (2 minutes) and KeepAliveInterval to 15000 (15 seconds) to probe earlier. Use cautious values and test — these registry changes affect all TCP sockets.

Configure network element timeouts

Even with keepalive probes, intermediate devices might drop “idle” connections if their idle timeout is shorter than your probes. Common adjustments:

NAT gateways and firewalls

Many on-prem NAT devices have configurable TCP session timeouts. Increase the timeout to a safe upper bound (e.g., several hours) or ensure keepalive interval is shorter than the device timeout.

Linux conntrack tuning (for machines acting as NAT/firewall):

/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established — default varies, often 432000 (5 days) on some systems; lower on embedded devices
Set appropriate value, for example 7200 (2 hours) if you want shorter cleanup or much larger for long-lived VPNs

Example:

sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=7200

Cloud load balancers and managed NAT

Cloud providers often set conservative idle timeouts (e.g., Azure Load Balancer default 4 minutes, AWS ELB default 60 seconds for classic ELB; ALB/ELB/NGINX vary). Options:

Increase LB idle timeout to be greater than your keepalive interval (recommended).
Use TCP-level health probes for session persistence where supported.
Place VPN servers outside LBs that impose strict idle timeouts, or use passthrough modes.

Important: If you cannot change the LB timeout, set your TCP/PPP keepalive interval to be comfortably shorter than the LB timeout so probes keep the flow alive.

Application-level considerations and MSS/MTU

SSTP packets traverse the path MTU and MSS matters because fragmentation can trigger retransmission stalls that resemble a dead link. Recommended steps:

Check path MTU using ping with DF bit and adjust server/client MTU/MSS if fragmentation occurs.
For PPP over SSTP, typical MTU values are 1400 or 1360 to account for TLS and PPP headers. Test and tune.

Lowering MTU reduces fragmentation risk and can improve reliability across heterogeneous paths.

Timeout and keepalive recipes for common environments

Below are tested starting points. Adjust by monitoring and iterative tuning.

Linux SSTP server + clients

PPP: lcp-echo-interval 30, lcp-echo-failure 4
OS TCP keepalive: tcp_keepalive_time=120, tcp_keepalive_intvl=15, tcp_keepalive_probes=5
Conntrack: nf_conntrack_tcp_timeout_established >= 7200
MTU: 1400 or 1360 depending on tests

Windows Server RRAS + Windows clients

Registry TCP: KeepAliveTime = 120000, KeepAliveInterval = 15000
PPP: on the server side use RRAS policies to prevent idle disconnects; configure idle timeout policies per user/group
Adjust load balancer idle timeout in front of RRAS to > 5 minutes or match keepalive

Cloud-hosted SSTP

Ensure cloud LB idle timeout > your keepalive interval (or set probes more frequently)
Use sticky sessions or passthrough for TCP when possible
Monitor with tcpdump/ss/netstat and cloud metrics to detect resets and timeouts

Diagnostics and monitoring

Diagnose disconnects systematically:

Capture packets at client and server (tcpdump, Wireshark) to see who sends RST/FIN or if NAT timeouts silently drop state.
Observe TCP state with ss/netstat to detect ESTABLISHED vs TIME_WAIT vs CLOSE.
Use conntrack -L on Linux NATs to see if entries are removed early.
Check Windows Event Logs (RAS/RRAS) for PPP disconnection reasons and error codes.

Look for repeated retransmissions, long application-layer silence, or synchronous retransmit behavior that point to MTU, congestion, or middlebox cleanup.

Best practices and operational tips

Keep probe intervals shorter than the shortest idle timeout on any intermediate device. This is the single most important rule to avoid NAT/ALB-induced disconnects.
Prefer server-side LCP/PPP probing for deterministic disconnect detection at the VPN layer rather than relying only on TCP keepalives.
Document and manage timeout settings as part of deployment manifests and infrastructure-as-code so changes are trackable and reversible.
Test from representative client networks (home, cellular, corporate) because carrier-grade NATs and mobile ISPs often have aggressive session cleanup.
Log and alert on unexpected disconnect patterns — automated tuning should follow observed failures, not just default values.

Conclusion

Optimizing SSTP reliability requires coordinating keepalive and timeout settings across PPP, TCP, and the network path. Start by enabling PPP/LCP probes inside the tunnel, tune OS TCP keepalive parameters for faster probing, and review all intermediate device idle timeouts (NAT, load balancers, firewalls). Adjust MTU/MSS to avoid fragmentation and instrument your environment with packet captures and connection-state monitoring to validate changes. With a methodical approach you can reduce false disconnects, shorten recovery time, and deliver a stable VPN experience for users and systems.

For hands-on deployment guides and managed Dedicated IP options, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.