In modern VPN deployments, IKEv2 is the de facto standard for negotiating secure IPsec tunnels due to its robustness, mobility support, and efficient state machine. However, maintaining long-lived secure connections requires careful handling of rekeying and Security Association (SA) lifetime management. Poorly designed rekey policies can cause traffic disruption, weaken security, and complicate troubleshooting. This article provides actionable, technical best practices for IKEv2 rekeying and SA lifetime management targeted at site operators, enterprise engineers, and developers building VPN solutions.
Fundamentals: IKEv2 SA Types and Rekeying Mechanisms
Before diving into best practices, it helps to recap the IKEv2 SA model and the standard rekey mechanisms:
- IKE_SA (IKE Security Association) protects IKE control messages between peers. It is established via the IKE_SA_INIT and IKE_AUTH exchanges (or MOBIKE-modified flows).
- CHILD_SA protects user traffic (ESP or AH). A single IKE_SA can manage multiple CHILD_SAs. CHILD_SAs are created and rekeyed with the CREATE_CHILD_SA exchange.
- Rekey flow for CHILD_SA: peers perform a CREATE_CHILD_SA with new keying material (new SK keys), negotiate new Traffic Selectors and lifetimes, and install the new SA while preserving the old until the transition is complete.
- Rekey flow for IKE_SA: if the IKE_SA lifetime is nearing expiry, peers perform an IKE SA rekey using CREATE_CHILD_SA to generate a new IKE_SA (often called IKE SA rekey) or employ an IKE SA rekey exchange as defined in RFC 7296.
Design Goals for Rekeying and Lifetime Policies
Any lifetime and rekey strategy should aim to meet three goals:
- Security: limit exposure by reducing cryptoperiods and using strong key exchange (DH groups) and Perfect Forward Secrecy (PFS) where needed.
- Reliability: ensure rekeying does not cause packet loss or session interruption for applications sensitive to brief outages.
- Scalability and Operational Simplicity: avoid policy churn and excessive CPU/bandwidth consumption due to very frequent rekeys, especially on gateways supporting many concurrent sessions.
Recommended Lifetime Defaults
Defaults depend on threat model and performance constraints; the following are pragmatic starting points:
- IKE_SA lifetime: 24 hours (86,400 seconds) is a practical default for many enterprise deployments. Shorter lifetimes (e.g., 1–8 hours) are better for high-security environments.
- CHILD_SA lifetime: 1–4 hours (3,600–14,400 seconds) often balances security and performance. For long-haul site-to-site tunnels with stable endpoints, lifetimes up to 8–12 hours can be acceptable if combined with strong PFS.
- Bytes-based lifetime: Some implementations allow lifetime negotiation by bytes transferred (SA_LIFETIME_BYTES). Use this to cap key use on very high-bandwidth flows.
Soft vs Hard Rekeying: Strategy and Implementation
Implement both soft and hard lifetime semantics:
- Soft lifetime (rekey threshold): trigger rekeying proactively before the SA expires — commonly at 60–75% of the configured lifetime. This reduces the risk of both peers letting the SA expire concurrently and avoids last-moment rekey storms.
- Hard lifetime (expiration): the SA must be retired once the hard limit is reached. If rekeying failed and hard lifetime is reached, the SA should be removed to avoid cryptographic key reuse outside policy.
Protocol-wise, IKEv2 handles simultaneous rekeys gracefully: both peers can initiate CREATE_CHILD_SA; implementations must correctly correlate SPIs and nonces to complete whichever exchange wins. Ensure your stack follows RFC 7296 guidance for simultaneous rekey handling to avoid races.
Key Best Practices for Reliable Rekeying
- Initiate rekey early. Start CREATE_CHILD_SA when the soft lifetime threshold is hit. Early initiation gives time for retransmissions and reduces the chance of hard expiry before completion.
- Use PFS judiciously. Enabling PFS (fresh DH exchange during each rekey) adds security but increases CPU load and latency. For sensitive tunnels, enforce PFS; for bulk encrypted links with constrained CP resources, consider session-based PFS policies.
- Coordinate lifetimes. When multiple CHILD_SAs are multiplexed under an IKE_SA, align their lifetimes to avoid frequent, repeated CREATE_CHILD_SA calls. If traffic selectors differ, tune per-selector lifetimes appropriately.
- Monitor rekey success rates. Log CREATE_CHILD_SA initiations, completions, failures, and retransmission counts. Set alerting thresholds for repeated failures which can indicate MTU/NAT-T issues or asymmetric routing.
- Use asymmetric thresholds to avoid simultaneous heavy load. If you control both ends, stagger soft thresholds to prevent both peers from rekeying at the same second (e.g., client rekeys at 70%, server at 80%).
- Handle simultaneous initiation gracefully. Implement the “race resolution” behavior per RFC: if both sides create a new CHILD_SA simultaneously, one will accept the other’s NEW_SA and delete its tentative SA; ensure the implementation is tolerant of this.
Practical Considerations: MTU, NAT-T, and Packet Loss
Real-world networks introduce complications during rekey:
- NAT Keepalive and NAT-T. NAT traversal keeps UDP encapsulation for ESP; ensure NAT-T keepalives and port mapping refresh are active during rekey exchanges. Without this, the CREATE_CHILD_SA packets may be dropped by NAT devices.
- MTU and fragmentation. Rekey messages can be large (multiple payloads, certificates). Use path MTU discovery and consider smaller certificate footprints (OCSP stapling, compressed cert chains) to avoid fragmentation which increases loss risk.
- Retransmission tuning. IKEv2 uses retransmission with exponential backoff. Configure reasonable initial RTOs (e.g., 1–2s) and cap retransmissions to avoid indefinite CPU use. Ensure DPD (Dead Peer Detection) timers are tuned so rekey retransmissions aren’t mistaken for peer death.
Operational Security: Algorithms, DH Groups, and Cryptoperiods
Select cryptographic parameters with both security and performance in mind:
- Algorithms: prefer AEAD algorithms (e.g., AES-GCM) for CHILD_SA, and strong PRFs and PRFs+ for IKE (e.g., SHA-256 or SHA-384).
- DH groups: use at least group 14 (2048-bit MODP) or stronger (e.g., Curve25519 or more modern elliptic groups) depending on platform support.
- Cryptoperiod: shorter lifetimes reduce exposure to key compromise but increase rekey load. Combine shorter lifetimes with robust rekey mechanisms and monitoring.
Edge Cases and Advanced Topics
MOBIKE and Mobile Clients
With MOBIKE, IKE_SA survives network changes. Rekey strategies must account for frequent network transitions — ensure rekey exchanges tolerate IP address changes mid-exchange and that NAT mappings are refreshed promptly.
High-Availability Gateways and State Synchronization
On active-active or active-standby clusters, synchronize SA state across peers. During failover, the new primary must either have access to the SA keys or quickly establish new SAs without disrupting traffic. Consider implementing state replication for SK keys and sequence numbers to avoid hard failovers.
Logging, Metrics, and Alerting
Track these metrics for operational health:
- Number of rekey attempts and successes per tunnel
- Average rekey latency and retransmission count
- Frequency of hard expirations and forced deletions
- CPU and memory usage spikes correlated with rekey events
Alert on abnormal patterns: repeated rekey failures, excessive retransmissions, or drift between expected and actual lifetimes can indicate network or implementation problems.
Troubleshooting Checklist
- If rekeys are failing or timing out: check MTU/fragmentation, NAT port mappings, and firewall rules that may drop UDP 500/4500.
- If frequent rekeys cause CPU spikes: review DH groups, consider lowering PFS frequency, or increase CHILD_SA lifetime where acceptable.
- If post-rekey traffic is dropped: verify anti-replay windows and sequence numbers, and ensure old SPIs remain valid during the transition window.
- If failover causes traffic interruption: confirm state synchronization or implement a short overlap window where both old and new SA keys are accepted.
Summary Recommendations
- Proactively rekey at a soft threshold (60–75%) and enforce a hard lifetime to avoid key reuse.
- Use PFS for high-security tunnels but balance CPU load by tuning frequency.
- Align lifetimes where feasible and stagger thresholds between peers under your control to reduce simultaneous rekey collisions.
- Tune retransmissions, DPD, and NAT-T so that rekey exchanges complete reliably across varied network paths.
- Monitor and alert on rekey activity and failures — visibility is crucial for operational reliability.
IKEv2 provides a robust framework for rekeying and SA lifetime management, but real-world reliability depends on sound policy choices, careful tuning, and operational monitoring. Implementing the above best practices will help ensure VPN tunnels remain both secure and seamless for users and services.
For more practical guides, configuration examples, and managed dedicated IP VPN solutions that implement these best practices, visit Dedicated-IP-VPN.