Notes

An Encrypted Wi-Fi Mesh Backbone with 802.11s, WireGuard, and VXLAN

Bridge two OpenWrt routers into one Layer 2 network over an untrusted wireless backhaul, stacking 802.11s for transport, WireGuard for encryption, and VXLAN.

Why I wrote this

I wanted one seamless network across a hard-to-wire span, and I did not want the wireless backhaul to be a soft spot. The result is a stacked transport, tunnel, and overlay design that is worth explaining.

deep dive 9 min read

I have two OpenWrt routers with no Ethernet between them. The obvious answer is a wireless mesh: a dedicated radio on each router forms a backhaul link, and clients roam between the two access points as one network.

The catch is that I wanted three things the consumer mesh products do not give you together:

  • One Layer 2 domain across both routers, so a device keeps its IP and DHCP lease as it roams, and so the remote AP can serve isolated VLANs (guest, IoT) without routing.
  • Encryption on the backhaul, because a dedicated 5 GHz link carrying internal traffic is exactly the thing you do not want exposed over the air.
  • No dependence on the radio’s own encryption, because on this hardware class it does not work reliably.

What I ended up with is a three-part stack: 802.11s carries the backhaul, WireGuard provides an encrypted Layer 3 tunnel, and VXLAN bridges each VLAN across that tunnel. This post is about why each part is there and how they fit together.

The hardware is two identical OpenWrt routers using the ath11k driver. I will call them gw (the gateway) and ap (the access point). Addresses below are illustrative.

Why not just encrypt the 802.11s mesh?

802.11s is the IEEE standard for wireless mesh. It has its own link-layer encryption: SAE (the WPA3 handshake) or a pre-shared key. On paper, you turn it on and your mesh frames are encrypted. Done.

On ath11k it does not work. With either SAE or WPA2-PSK configured on the mesh interface, the stations associate but the mesh peer link never reaches ESTAB. It hangs in LISTEN. No error, no fallback; the backbone simply never forms.

Warning

On this ath11k hardware, encrypted 802.11s fails to establish peer links with both SAE and PSK. The interfaces come up, but iw dev <mesh> station dump shows no established peer. This is a driver limitation, not a configuration mistake; do not burn an afternoon retrying key types.

That left a choice: give up on encryption, or stop relying on the radio for it. I chose the second. The 802.11s link runs unencrypted, and a WireGuard tunnel rides on top of it. The unencrypted radio only ever carries WireGuard’s UDP packets, so anyone with a 5 GHz monitor-mode card in range sees ciphertext and nothing else.

Insight

When a layer cannot provide a guarantee you need, do not weaken the guarantee; move it to a layer that can. The radio is bad at confidentiality, so the radio’s job becomes “move bytes,” and confidentiality moves up the stack to WireGuard.

The three-part stack

Each part has exactly one job, and each is independently testable, which matters enormously when something breaks.

┌──────────────────────────────────────────────────────────────┐
│  L2 overlay - VXLAN: bridge each VLAN across the tunnel       │
│     vxlan10 ↔ br-lan   vxlan11 ↔ br-iot   vxlan12 ↔ br-wifi    │
├──────────────────────────────────────────────────────────────┤
│  Encrypted L3 tunnel - WireGuard (wg_mesh)                    │
│     gw 10.255.0.1  ↔  ap 10.255.0.2                           │
├──────────────────────────────────────────────────────────────┤
│  Wireless transport - 802.11s mesh                            │
│     gw 169.254.100.1  ↔  ap 169.254.100.2                     │
└──────────────────────────────────────────────────────────────┘

Read it bottom-up: the radio gives you a reachable link-local address on the other router; WireGuard builds an encrypted point-to-point Layer 3 tunnel over that address; VXLAN carries Layer 2 Ethernet segments inside the WireGuard tunnel, one per VLAN.

Transport: 802.11s as dumb backhaul

The bottom layer is a single dedicated radio on each router, doing nothing but the mesh.

ParameterValue
ProtocolIEEE 802.11s
Band / channel5 GHz, channel 149, HE80
Mesh IDhome-mesh
gw address169.254.100.1/30
ap address169.254.100.2/30
Encryptionnone (see above)
CarriesWireGuard UDP only

Two design decisions matter here:

Dedicate a radio to the backhaul. These routers are tri-band. One 5 GHz radio runs the mesh and hosts no client SSIDs at all. The moment you share a radio between backhaul and clients, the backhaul fights client traffic for airtime and your effective throughput collapses under load. A dedicated radio keeps the backbone bandwidth stable.

Use link-local addressing. The mesh transport gets a small /30 carved from the 169.254.0.0/16 link-local range: it exists only to give WireGuard an endpoint to dial. Nothing else routes over it directly.

One firewall subtlety: the 802.11s interface lives in the trusted (LAN) zone on both routers, with input accepted. If it is in a zone that drops input, the WireGuard handshake packets never arrive and the tunnel above it never comes up.

Encrypted tunnel: WireGuard over the mesh

WireGuard (wg_mesh) is a point-to-point tunnel between the two routers, with its endpoint set to the other router’s link-local mesh address.

ParameterValue
Interfacewg_mesh
gw address10.255.0.1/30
ap address10.255.0.2/30
Listen port51820
Endpoint (apgw)169.254.100.1:51820
Persistent keepalive25 s

Because the endpoint is the link-local address from the 802.11s transport, WireGuard’s reachability depends entirely on the mesh being up. That is the correct dependency, but it produces the single nastiest failure mode in this whole design.

The rogue endpoint route

When OpenWrt’s WireGuard protocol handler brings up a tunnel whose endpoint is not yet routable, it helpfully inserts a host route to that endpoint via the default gateway. On ap, that produces:

169.254.100.1 via 10.0.10.1 dev br-lan

This /32 is more specific than the connected /30 on the mesh interface, so the kernel prefers it. Now ap tries to reach the mesh endpoint through the LAN bridge, which is itself carried over the mesh by the VXLAN tunnels above. The path eats its own tail, and the link collapses.

Warning

WireGuard with a link-local endpoint plus a default route is a trap. The proto handler’s helper route sends mesh traffic back through the gateway, creating a circular dependency. The fix is to delete that route on every tunnel bring-up: it is regenerated each time, so a one-off ip route del is not enough; it needs a hotplug hook.

The verification command, run on the AP:

ip route | grep 169.254.100.1
# if you see "via <gateway>", delete it:
ip route del 169.254.100.1 via 10.0.10.1

This gotcha (and several others on this network) deserves its own postmortem; here it is enough to know the tunnel’s reachability and the default route can fight each other.

L2 overlay: VXLAN for bridging

WireGuard gives an encrypted Layer 3 tunnel. But I wanted Layer 2 semantics, one broadcast domain per VLAN spanning both routers, so that:

  • A device keeps its DHCP lease and IP when it roams from gw’s AP to ap’s AP.
  • The remote router can serve the same isolated VLANs (LAN, IoT, guest Wi-Fi) without doing any routing or running its own DHCP for the trusted LAN.
  • Broadcast/multicast discovery (mDNS, etc.) works across the two APs as if they were one switch.

VXLAN does exactly this: it wraps Ethernet frames in UDP so an L2 segment can be carried over an L3 network. Here, one VXLAN interface per VLAN runs inside the WireGuard tunnel.

VXLANVNIBridged intoVLAN
vxlan1010br-lanLAN
vxlan1111br-iotIoT
vxlan1212br-wifiWi-Fi

The remote endpoint of each VXLAN is the WireGuard address (10.255.0.x), not the radio address, so every bridged frame is encapsulated in VXLAN and then encrypted by WireGuard before it touches the air.

Note

These VXLAN interfaces are created by a boot script, not by the OpenWrt config system (UCI). That means uci show network will not list them, and a network reload will silently destroy them without recreating them. After any network change, confirm they still exist with ip -d link show type vxlan.

MTU: pay the encapsulation tax

Stacking tunnels eats into the MTU, and getting this wrong gives you the classic “small packets work, large transfers hang” symptom. Roughly:

  • Start at the 1500-byte path MTU.
  • WireGuard overhead is about 60 bytes.
  • VXLAN encapsulation is about 50 bytes.
  • That leaves an effective inner MTU near 1390; I set the bridged interfaces to 1370 for margin.
ip link set vxlan10 mtu 1370

Set the MTU explicitly and clamp TCP MSS so endpoints negotiate a segment size that fits. Relying on Path MTU Discovery through a double-encapsulated tunnel is how you get intermittent stalls that are miserable to diagnose.

Bringing it up in the right order

The layers must come up bottom-first, and one of them needs to wait for the other. On the AP, a boot script:

  1. Lets the 802.11s mesh and wg_mesh start (these are config-managed).
  2. Deletes the rogue endpoint route described above.
  3. Sleeps ~15 seconds so the WireGuard handshake can complete.
  4. Creates vxlan10/11/12 over the now-live tunnel and brings up the bridged VLAN interfaces.

The sleep is not superstition. VXLAN over WireGuard only passes traffic once the handshake is done; create the VXLANs too early and they come up pointed at a tunnel that is not yet carrying data. Cold boot to fully operational is about 30–45 seconds.

Verifying each layer independently

The payoff of clean stacking is that you can bisect a failure in three commands. Work bottom-up: the overlay can only be healthy if the tunnel and transport below it are healthy.

# Transport: is the RF mesh peer established?
iw dev <mesh-iface> station dump | grep -E 'plink|signal'
#   want: plink ESTAB, signal in a workable range

# Encrypted tunnel: is WireGuard handshaking?
wg show wg_mesh | grep -E 'handshake|transfer'
#   want: a recent handshake and bytes moving both ways

# L2 overlay: do the VXLAN interfaces exist and learn MACs?
ip -d link show type vxlan
bridge fdb show dev vxlan12 | head

And the end-to-end path test, pinging the far router at each layer:

ping <ap-mesh-ip>      # 169.254.100.2: tests transport only
ping <ap-wg-ip>        # 10.255.0.2:    tests transport + tunnel
ping <ap-lan-ip>       # 10.0.10.2:     tests transport + tunnel + overlay

If the mesh ping works but the WireGuard ping does not, suspect the rogue route or a stale handshake. If both work but the LAN ping does not, the VXLANs are missing; re-run the boot script.

What it costs, and when it is worth it

This is not free. Two honest tradeoffs:

Throughput is capped by the radio. A difficult 5 GHz path can run at a modest signal level and a low MCS rate; effective backbone throughput here is roughly 100–150 Mbps. For an internet connection of a few hundred megabits, that is fine. For moving large files across the backhaul, wire it.

Complexity lives in scripts, not config. The VXLANs and the route fix are imperative boot-time steps outside the config system. That is fragile: a network reload can drop the VXLANs, and you have to remember why. The mitigation is documentation and the verification commands above; the real fix would be a proper netifd integration.

Tip

If you can run a cable, run a cable. This stack exists because I could not. But when the backhaul must be wireless and must be encrypted, separating transport (802.11s), confidentiality (WireGuard), and bridging (VXLAN) into independent, individually testable layers is far more robust than hoping the radio’s own encryption works.

The whole design is one idea applied three times: give each layer exactly one job, and let the layer that is good at a thing do that thing. The radio moves bytes. WireGuard keeps them secret. VXLAN makes two access points look like one switch.