Monitor WireGuard with Prometheus: Real-Time Metrics, Alerts, and Insights

Monitoring WireGuard at scale requires more than occasional checks: engineers need real-time visibility into peer health, throughput, handshake freshness, and configuration drift. Prometheus provides a flexible, scalable platform for scraping and alerting on WireGuard metrics, while Grafana visualizes trends and anomalies. This article walks through a practical, production-ready approach to instrumenting WireGuard with Prometheus — including exporter options, configuration examples, alerting rules, and dashboard queries — so site operators, enterprises, and developers can detect problems early and keep VPN infrastructure healthy.

Why monitor WireGuard with Prometheus?

WireGuard is lightweight and fast, but its surface area for monitoring is different from traditional VPNs: there’s no built-in metrics endpoint and most useful signals live in the kernel interface or the wg tool output. Prometheus is well-suited because it:

Scrapes numeric metrics at regular intervals and stores them as time series.
Makes it easy to write alerting rules for latency, throughput, and handshake freshness.
Integrates cleanly with Grafana for dashboards and with Alertmanager for escalations.

Collecting WireGuard metrics: exporter options

There are several approaches to expose WireGuard state as Prometheus metrics. Choose one based on your environment and security constraints.

1. Dedicated WireGuard Prometheus exporter

Open-source exporters parse the output of the wg utility or read directly from the kernel netlink interface and expose Prometheus metrics. Examples include community projects like wg-exporter or wireguard_exporter. Typical metrics provided:

wireguard_peer_handshake_time_seconds — last handshake timestamp (seconds since epoch).
wireguard_peer_receive_bytes_total and wireguard_peer_transmit_bytes_total — cumulative traffic counters.
wireguard_peer_allowed_ips_count — number of allowed IPs per peer.
wireguard_peer_endpoint (label) — remote endpoint IP:port.

These exporters are usually written in Go and distributed as static binaries or Docker images. They typically accept flags to select the interface to monitor and the listen address for the metrics endpoint.

2. Custom exporter using wg JSON and the node textfile collector

Modern wg supports JSON output (wg show all dump or wg show all json). You can write a short script (Bash, Python, or Go) that:

Runs periodically (cron or systemd timer).
Parses the JSON to extract peer keys, latest_handshake_time_sec, transfer_rx_bytes, transfer_tx_bytes, endpoint, allowed_ips.
Writes Prometheus textfile format files into /var/lib/node_exporter/textfile_collector/ (if using node_exporter).

This approach gives full control over naming and labels and is handy when you want to avoid extra services. Example metric lines:

wireguard_peer_receive_bytes_total{interface="wg0",public_key="ABCD..."} 123456

3. Exporter inside Docker or sidecar

If WireGuard runs inside containers or as part of a platform, run the exporter as a sidecar container with host network access so it can call wg or hit the kernel interface. Ensure correct permissions to read WireGuard state.

Installing and running a WireGuard exporter (systemd example)

Below is a concise systemd unit and startup configuration for a generic Go-based exporter that listens on localhost and exposes metrics on /metrics. Adjust flags to match your exporter binary.

/etc/systemd/system/wg-exporter.service

[Unit] Description=WireGuard Prometheus Exporter After=network.target


[Service]
User=wg-export

Group=wg-export

ExecStart=/usr/local/bin/wg-exporter --interface=wg0 --listen-address=127.0.0.1:9586

Restart=on-failure
[Install]
WantedBy=multi-user.target

Create a dedicated user, place the binary in /usr/local/bin, and secure the systemd unit. For containerized setups, map the /proc and /dev namespaces if needed.

Prometheus configuration: scrape job and relabeling

Use the following scrape_configs snippet to add the exporter to Prometheus. It demonstrates label hygiene and basic authentication methods if needed.

scrape_configs: - job_name: 'wireguard' static_configs: - targets: ['10.0.0.5:9586'] # exporter instances per host labels: role: vpn interface: wg0 metrics_path: /metrics scheme: http # optional basic auth basic_auth: username: 'promuser' password: 'secret' relabel_configs: - source_labels: [__address__] target_label: instance - source_labels: [public_key] regex: '(.+)' replacement: '${1}' target_label: peer_pubkey - source_labels: [private_key] regex: '.*' action: drop # ensure no private keys leak into TSDB

Security note: Make sure the exporter does not expose private keys or sensitive data. Use relabeling to drop sensitive labels before ingestion, and consider securing the metrics endpoint via TLS or network ACLs.

Key metrics and how to interpret them

Understanding the most useful signals lets you build meaningful alerts and dashboards.

Handshake freshness

Metric: wireguard_peer_handshake_time_seconds (epoch seconds). Compute the age with PromQL:

time() - wireguard_peer_handshake_time_seconds{interface="wg0"}

Alert if handshake_age > X seconds for critical peers (e.g., 300s for interactive clients, longer for low-frequency IoT peers).

Throughput and trends

Metrics: wireguard_peer_receive_bytes_total, wireguard_peer_transmit_bytes_total.

Calculate per-second rates with rate():

sum by (peer_pubkey) (rate(wireguard_peer_receive_bytes_total[5m]))

Use moving windows (5m, 1h) to spot sudden spikes or drops indicating floods, outages, or connectivity problems.

Peer configuration drift

Track labels like allowed_ips and endpoint as part of metrics (or external config monitoring). Alert on unauthorized endpoint changes or unexpected allowed IPs that could indicate compromise or misconfiguration.

Peer count and session churn

Count peers and monitor flapping:

count(wireguard_peer_handshake_time_seconds{interface="wg0"})

High churn (frequent handshake resets) can indicate network instability, MTU issues, or NAT timeouts.

Practical alerting rules

Below are sample Prometheus alerting rules that you can paste into your rules files. Adjust thresholds for your network characteristics.

groups: - name: wireguard.rules rules: - alert: WireGuardPeerStaleHandshake expr: (time() - wireguard_peer_handshake_time_seconds) > 300 for: 2m labels: severity: critical annotations: summary: "Peer {{ $labels.peer_pubkey }} has stale handshake on {{ $labels.instance }}" description: "Last handshake was more than 5 minutes ago. Possible connectivity outage."


  - alert: WireGuardHighTransmit

    expr: rate(wireguard_peer_transmit_bytes_total[5m]) > 10000000

    for: 1m

    labels:

      severity: warning

    annotations:

      summary: "High transmit rate from peer {{ $labels.peer_pubkey }}"

      description: "Transmit rate exceeds 10MB/s. Investigate potential abuse or backup traffic."

- alert: WireGuardPeerCountDrop expr: count(wireguard_peer_handshake_time_seconds) by (instance) < 1 for: 3m labels: severity: critical annotations: summary: "No active WireGuard peers on {{ $labels.instance }}" description: "All peers appear disconnected; check interface and service."

Grafana dashboards and useful queries

Create panels for these core views:

Handshake age heatmap: use the handshake age PromQL and visualize as table or gauge per-peer.
Top talkers: a table ranked by 5m receive/transmit rate.
Peer map: list peers, endpoints, last handshake, and allowed IPs using labels.
Connection churn: count of handshake events over time — can be computed by increasing counters or changes in handshake timestamps.

Example query for Top Talkers (5m):

topk(10, sum by (peer_pubkey) (rate(wireguard_peer_receive_bytes_total[5m])))

Operational tips and troubleshooting

Retention and TSDB sizing: WireGuard environments can generate many series if you have thousands of peers. Use relabeling to control label cardinality, and use recording rules to aggregate.
Protect secrets: Ensure exporters and scripts do not write private keys into metrics. Review code and use relabel_config to drop any sensitive labels.
Network reachability: If exporters run on hosts behind NAT or transient networks, prefer pushing metrics via a secure gateway, or use Prometheus Federation to centralize metrics.
Combine with Node metrics: Pair WireGuard metrics with node_exporter metrics (network interface errors, CPU, memory) to correlate performance issues.
Test alerts: Use the Prometheus expression browser and Alertmanager’s silences to iterate threshold tuning before hitting production on-calls.

Advanced patterns

For large deployments consider:

Federating scrape jobs by region and using a central Prometheus for global alerts. This reduces cross-region scrape overhead and keeps local visibility.
Using recording rules to precompute heavy aggregations (e.g., per-region bandwidth) for faster dashboards and cheaper queries.
Implementing anomaly detection using Prometheus’ predict_linear or external ML systems that consume Prometheus data for burst detection.

Example: small script that converts wg JSON to Prometheus textfile

Use a short script to produce metrics consumed by node_exporter textfile collector. The script should:

Call wg show all dump or parse wg show all json.
For each peer, emit metrics for handshake time and transfer counters.
Write to /var/lib/node_exporter/textfile_collector/wireguard.prom atomically.

Running this as a cron job every 15s–60s is sufficient for most use cases. Ensure proper file permissions and that node_exporter is configured with the textfile collector enabled.

Monitoring WireGuard with Prometheus delivers actionable insight into connectivity, throughput, and peer behavior. By choosing a robust exporter approach, securing the metrics flow, and crafting well-tuned alerts and dashboards, operations teams can rapidly detect outages, mitigate abuse, and maintain performant VPN services.

For further resources and tools, visit Dedicated-IP-VPN at https://dedicated-ip-vpn.com/.