WireGuard is already the fastest general-purpose VPN protocol available, but default Linux kernel settings are tuned for desktop workloads, not high-throughput encrypted UDP forwarding. If you’re routing 1 Gbps+ through a WireGuard tunnel and seeing CPU spikes, packet drops, or asymmetric performance, the bottleneck is almost never WireGuard itself — it’s the kernel’s conservative defaults for UDP buffering, congestion control, and interrupt handling.
This guide walks through every high-impact tuning parameter, provides copy-paste configurations for homelab and production scales, and shows you how to benchmark properly so you can measure your gains.
Why Default Linux Settings Limit WireGuard Throughput
WireGuard runs in kernel space (since Linux 5.6) as a virtual network interface. Traffic enters the interface, gets encrypted with ChaCha20-Poly1305, and exits as UDP packets on the real NIC. The kernel path is:
Application → wg0 (encrypt) → eth0 (UDP out) → Network
Three things can go wrong with stock settings:
- UDP receive buffers too small — the kernel drops packets before WireGuard can decrypt them under high throughput
- CUBIC congestion control — designed for long-fat TCP networks, not UDP tunnels carrying TCP-over-TCP traffic
- NAPI polling budget too low — the kernel yields the CPU before draining the NIC ring buffer, causing drops during bursts
Each of these has a direct sysctl fix.
Step 1 — Benchmark Your Baseline (Before Touching Anything)
Never tune blind. Establish a baseline so you can compare before/after.
Install iperf3 on both ends:
|
|
Start the server on the remote endpoint:
|
|
Run a single-stream test from the client:
|
|
Then a parallel-stream test to assess multi-core scaling:
|
|
Test the reverse direction (often asymmetric with default settings):
|
|
Record the single-stream, parallel, and reverse results. If your parallel test is significantly faster than single-stream, you have headroom to gain by tuning. If they’re identical, you’re hitting a per-flow bottleneck — likely MTU fragmentation or UDP buffer drops.
Also check for retransmits and packet drops during the test:
|
|
High RX errors or dropped on the real NIC under WireGuard load is
your first red flag.
Step 2 — sysctl Configuration for WireGuard Performance
Create /etc/sysctl.d/99-wireguard.conf with the following
optimizations. Apply with sysctl --system after editing.
UDP Buffer Sizes — The Single Biggest Win
|
|
The default rmem_max on most distros is 212 KB — completely
inadequate for 1 Gbps WireGuard tunnels. With a 16 MB buffer, the
kernel can queue incoming encrypted packets while WireGuard processes
them, eliminating drops during traffic bursts.
BBR Congestion Control — Better Than CUBIC for VPN Tunnels
|
|
BBR (Bottleneck Bandwidth and Round-trip propagation time) is significantly less sensitive to packet loss than CUBIC. This matters because TCP-over-TCP inside a WireGuard tunnel amplifies loss — a single dropped encrypted UDP packet can trigger retransmits in both the inner and outer TCP stacks. BBR minimizes this by pacing based on measured bandwidth and RTT rather than packet loss signals.
Verify BBR is active:
|
|
IP Forwarding and Connection Tracking
|
|
WireGuard itself is stateless, but if you’re using iptables/nftables masquerading for egress traffic through the tunnel, conntrack tracks every connection. At 100+ concurrent devices, the default of 65536 may fill up, causing new connections to be dropped.
NAPI Polling Budget — Handle Traffic Bursts
|
|
netdev_budget defaults to 300 packets per SoftIRQ cycle. Under heavy
WireGuard load, 10 Gbps NICs can deliver thousands of packets in a
single interrupt. Increasing to 600 (and up to 1200 for 10 Gbps+)
prevents the network stack from yielding before the NIC buffer is
drained.
Apply the config:
|
|
Step 3 — MTU and MSS Clamping
MTU mismatches are the most common cause of “slow WireGuard” that isn’t actually WireGuard’s fault. WireGuard adds 60 bytes of overhead (20 IP + 8 UDP + 32 ChaCha20-Poly1305) per packet. Standard Ethernet MTU of 1500 bytes leaves 1440 bytes for payload — but if your path has lower MTU (PPPoE adds 8 bytes, GRE tunnels add more), packets fragment.
Find the Correct MTU
From the WireGuard server, ping the client with DF (Don’t Fragment):
|
|
Start at 1472 (1500 − 28 for ICMP header) and decrease by 8 until you get no fragmentation. Subtract 60 for the WireGuard overhead.
Common MTU values:
| Network type | Interface MTU | WireGuard MTU |
|---|---|---|
| Standard Ethernet | 1500 | 1420 |
| PPPoE | 1492 | 1412 |
| PPPoE + VLAN | 1488 | 1408 |
| Jumbo frames | 9000 | 8920 |
Set it in your WireGuard interface config:
|
|
MSS Clamping for TCP Traffic
If you’re routing entire subnets through WireGuard (not just the tunnel endpoints), TCP connections may not honor the tunnel MTU. Clamp MSS at the firewall:
|
|
For nftables:
table inet mangle {
chain forward {
type filter hook forward priority mangle; policy accept;
oifname "wg0" tcp flags syn tcp option maxseg size set rt mtu
}
}
This forces the TCP MSS to fit inside the WireGuard tunnel MTU, eliminating fragmentation entirely.
Step 4 — Verify GRO/GSO Offloads Are Enabled
Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO) batch packets before handing them to WireGuard, reducing per-packet CPU cost significantly. Verify they’re enabled on the physical NIC:
|
|
You want to see:
gro: on
gso: on
tso: on
If any are off, enable them:
|
|
On some NICs (especially virtualized ones in Proxmox/VMware), these offloads may be disabled or unsupported. If they can’t be enabled, you lose some batching efficiency — but the sysctl tuning above still helps significantly.
Step 5 — CPU Scaling Governor
WireGuard’s ChaCha20-Poly1305 encryption is CPU-bound. If your CPU
frequency governor is set to powersave or ondemand, the kernel may
downclock cores under light load, causing throughput drops when a burst
arrives.
Set the governor to performance during benchmarking:
|
|
For permanent use, install cpufrequtils or configure tlp/power-profiles-daemon
to keep cores at max frequency while WireGuard is active. In a
homelab context, the power savings from downclocking are negligible
compared to the throughput cost.
Step 6 — Re-Benchmark with Tuned Configuration
After applying all changes, run the same iperf3 tests from Step 1:
|
|
Expected improvements on a 1 Gbps link:
| Metric | Before (defaults) | After (tuned) |
|---|---|---|
| Single-stream TCP upload | 300-500 Mbps | 700-950 Mbps |
| Parallel-stream TCP upload | 400-600 Mbps | 900-940 Mbps |
| Reverse direction (download) | 80-200 Mbps | 700-900 Mbps |
The reverse direction improvement is often the most dramatic — default receive buffers starve the download path because the kernel can’t queue incoming encrypted packets fast enough.
Common Pitfalls
Don’t tune both sides asymmetrically. Apply the same sysctl config on the server and all major clients for consistent behavior.
Don’t use wg-quick MTU unless you know your path MTU. wg-quick
defaults to 1420, which works for most Ethernet paths but may need
reduction for PPPoE, cellular, or tunnel-over-tunnel paths.
TCP-over-TCP is real. If you’re doing massive file transfers over
WireGuard, consider using UDP-based protocols (rsync over SSH with
-e "nc -u", or UDP-based file transfer tools) to avoid nested
congestion control fighting itself.
Systemd’s sysctl persistence. The config file in
/etc/sysctl.d/ persists across reboots. Verify with
sysctl --system after boot.
Complete Config File — Copy and Apply
|
|
Apply with sudo sysctl --system, set your WireGuard MTU to the
correct value after a path MTU discovery, clamp MSS on the forward
chain, and run your benchmarks. You should see 80-95% of line rate over
WireGuard on modern hardware — the same as what a direct connection
delivers.