Real-Time SSTP VPN Monitoring with Grafana: Quick Setup, Metrics & Dashboards

Monitoring SSTP (Secure Socket Tunneling Protocol) VPNs in real time is essential for service reliability, capacity planning, and incident response. For professional operators — system administrators, enterprise IT, and developers delivering VPN services — combining Prometheus and Grafana provides a lightweight, scalable observability stack. This article explains practical setup steps, metric sources, useful PromQL queries and dashboard ideas tailored to SSTP deployments on both Windows (RRAS) and Linux-based SSTP servers.

Why real-time monitoring matters for SSTP VPNs

SSTP runs over HTTPS and is often used when other VPN protocols are blocked. That makes it attractive for remote access, but also puts pressure on TLS, CPU, and web server subsystems. Real-time monitoring helps you to:

Detect authentication failures or spikes in connection attempts (possible brute force or misconfiguration).
Track active sessions and per-user bandwidth to identify overloaded gateways.
Monitor TLS/HTTPS health (handshake errors, certificate expiry) to avoid outages caused by expired certs.
Correlate VPN performance with host metrics (CPU, memory, NIC throughput) for capacity planning.

Overview of architecture

A recommended stack for SSTP monitoring:

Prometheus for time-series collection and storage.
Exporters to expose metrics: windows_exporter (Windows), node_exporter (Linux), ppp_exporter or custom scripts for PPP/SSTP counters, snmp_exporter where applicable.
Grafana for dashboards, alert visualizations and drilldown.
Alertmanager for notifications (Email, Slack, PagerDuty).
Optional: Grafana Loki for logs and Tempo/Jaeger for tracing if you instrument custom components.

Collecting SSTP-specific metrics

Because SSTP is implemented differently across platforms, collect metrics from these sources:

Windows RRAS (SSTP implemented by Microsoft)

Use windows_exporter to scrape Windows performance counters. Important counters:

Remote Access Service counters (e.g., RasConnections) — number of active connections.
Network Interface counters — bytes/sec per SSTP interface (if you use dedicated virtual adapters).
Process counters for svchost or the RRAS process — CPU and memory per process.
TCPv4/TCPv6 counters — retransmits, established sockets.

Windows Event Logs for authentication failures and TLS handshake errors — forward to Loki or convert to metrics with a small exporter (wmi_exporter can also expose some event counts or use Event Log Forwarding).
Certificate stores — track certificate expiry with a simple script and expose as metric (days_until_cert_expiry).

Linux-based SSTP servers

If you run sstpd (pppd + stunnel) or SoftEther SSTP, collect metrics from:

/proc/net/netstat and conntrack for session counts.
pppd-status files or management sockets — many ppp-based servers expose per-session stats you can parse and convert to Prometheus metrics via a custom exporter.
iptables/nftables counters — bytes and packets per rule (useful to track per-customer throughput if you mark packets).
node_exporter for host-level metrics (CPU, mem, disk, interface throughput).

stunnel or web server TLS logs — instrumented into metrics for handshake failures and certificate expiry.

Prometheus: configuration and best practices

Key points when configuring Prometheus for SSTP monitoring:

Scrape interval: for real-time troubleshooting, set a short scrape interval (15s or 10s). Beware of cardinality and storage implications.
Relabeling: add labels like job=”sstp-gateway”, env=”prod”, location=”datacenter-1″ to enable grouped dashboards and targeted alerts.
Use recording rules for expensive queries (aggregate session counts, byte rates) to reduce query latency in Grafana.
Secure scraping: use TLS, basic auth or mTLS where exporters run on remote nodes; protect Prometheus’s scrape endpoints behind a firewall.

Example scrape config snippet for windows_exporter:

<pre>
scrape_configs:
– job_name: ‘sstp-windows’
static_configs:
– targets: [‘10.0.1.10:9182’] # windows_exporter default port
relabel_configs:
– source_labels: [‘__address__’] target_label: ‘instance’
replacement: ‘rras-1’
</pre>

Essential metrics and recommended PromQL

Below are core metrics and example PromQL queries to drive Grafana panels.

Session and connection metrics

Metric: sstp_active_sessions (via exporter or derived from RasConnections)
Query: sstp_active_sessions{job="sstp-gateway"}
Metric: sstp_sessions_total (counter of completed sessions)
Query (rate over 5m): rate(sstp_sessions_total[5m])
Concurrent users per username label:
Query: max by (username) (sstp_active_sessions{env="prod"})

Bandwidth and throughput

Metric: bytes transmitted on interface exposed by node_exporter or windows_exporter (e.g., node_network_receive_bytes_total, node_network_transmit_bytes_total)
Query (total inbound bandwidth across gateways):
sum(rate(node_network_receive_bytes_total{device=~"sstp.|ppp."}[1m]))
Per-user or per-connection throughput (if exporter exposes bytes per session):
Query: rate(sstp_session_bytes_recv_total[30s])

Errors, auth failures, TLS issues

Metric: sstp_auth_failures_total
Query: rate(sstp_auth_failures_total[5m]) — alert on sustained spikes.
TLS handshake errors from stunnel or Windows SChannel logs exported as metrics:
Query: increase(tls_handshake_errors_total[5m])
Expired certificate days remaining:
Metric: cert_days_until_expiry{certname="sstp.example.com"} — alert if < 14 days.

Grafana dashboards: panels and layout suggestions

Design dashboards for different audiences: Operations, Network, and Capacity.

Operations (single-pane-of-glass)

Top row: overall health — Active sessions (gauge), Total throughput (timeseries), CPU & memory of SSTP hosts.
Middle row: authentication — rate of auth failures, top offending usernames (table), last successful/failed login events.
Bottom row: TLS health — cert expiry, handshake failures, TLS version distribution (Pie/Stat).

Network and per-user troubleshooting

List of active sessions with labels: username, client_ip, start_time, bytes_in/out — use table panel with links to logs.
Per-session throughput sparkline + packet drop/retransmit rate to detect flaky connections.

Capacity and trend analysis

7-day/30-day charts for peak concurrent sessions, 95th percentile bandwidth, and peak CPU to plan scaling.
Use annotation for deployments impacting capacity (software upgrades, cert renewals).

Alerting strategies

Good SSTP alerts are actionable, not noisy. Examples:

Critical: Active sessions > capacity threshold for > 5 minutes — include impacted gateway label.
Warning: Auth failure rate > X per minute for > 10 minutes — investigate credential leak or misconfiguration.
Critical: TLS cert expiry < 7 days — send high-priority notification.
Warning: sustained high retransmit or TCP errors — potential network issues between client and gateway.

Practical exporter examples and tips

Because SSTP-specific exporters are uncommon, you may need small glue code. Practical approaches:

Windows: Enable windows_exporter with the pdh collector and add the RemoteAccess counters. Use a PowerShell script to query certificate expiry and expose via a simple HTTP endpoint (Prometheus format).
Linux: Write a lightweight Python/Go exporter that parses /var/log/ppp, pppd control sockets or stunnel logs to produce per-session metrics. Use existing libraries like prometheus_client for exposition.
Labeling: include username, gateway, client_ip and session_id labels where cardinality is controlled. Avoid unbounded label sets (e.g., full user-agent strings).

Scaling and reliability

For fleet-scale SSTP deployments:

Use Prometheus federation or remote_write to a long-term store (Thanos, Cortex) for cross-datacenter queries.
Shard scraping and use blackbox_exporter probes for external reachability monitoring over the same ingress IPs.
Implement redundancy for exporters and use service discovery (Consul, DNS) to automate target additions.

Wrap-up and checklist

Quick checklist to get a robust SSTP monitoring pipeline:

Identify metric sources (windows_exporter, node_exporter, custom ppp exporter).
Deploy Prometheus with short scrape intervals for critical metrics and recording rules for aggregated data.
Create Grafana dashboards for Operations, Network, and Capacity teams with focused panels and drilldowns.
Implement actionable alert rules and wire them to Alertmanager integrations.
Monitor TLS cert expiry, auth failures, session counts, and per-session throughput as a baseline.

With the above approach you can achieve near real-time visibility into SSTP VPN behavior, reduce time-to-detect incidents, and make data-driven capacity decisions. For implementation examples, exporter snippets, or dashboard JSONs tailored to your environment, consider starting with windows_exporter/node_exporter and incrementally adding custom exporters for per-session SSTP metrics.

Published by Dedicated-IP-VPN — https://dedicated-ip-vpn.com/