Skip to content

feat(tunnel): IPv6 support — prefix delegation + PCP pinholes (WIP)#3388

Merged
dr-bonez merged 14 commits into
masterfrom
helix/tunnel-ipv6
Jul 4, 2026
Merged

feat(tunnel): IPv6 support — prefix delegation + PCP pinholes (WIP)#3388
dr-bonez merged 14 commits into
masterfrom
helix/tunnel-ipv6

Conversation

@helix-nine

@helix-nine helix-nine commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Status: ready for review. This is PR1 of a staged rollout (per @dr-bonez: 1a+2b, 3b, 4a). It lands the IPv6 addressing foundation — allocation, WireGuard config generation, proxy-NDP dataplane, DNS, CLI/API, UI, and docs. PCP pinholes are PR2. Commits: aa0603a (foundation), 6fea6f3 (proxy-NDP), c060ce1 (docs/changelog/version), 6201a62 (UI), d21dace (self-review fixes).

Decision (research-backed)

The addressing model hinges on what VPS providers actually delegate. I surveyed the providers the tunnel docs recommend:

Provider Default IPv6
Hetzner Cloud /64 (routed, gw fe80::1)
Vultr /64
BuyVM /64 (optional routed /48)
DigitalOcean /124 (16 addrs)
Linode/Akamai /128 default; /64 or /56 on request
OVH VPS /128 (shared /64); dedicated → /64, High-Grade → /56

So the common case is a single /64, sometimes less. Per the rule "if it's just a /64 → 1a+2b": the tunnel takes a statically configured routed prefix (1a) and assigns each client a single /128 out of it (2b) — while scaling up for free: a prefix shorter than /64 (Linode /56, dedicated, BuyVM /48) delegates a whole /64 per client (2a, real PD, so a StartOS box can sub-address its containers).

What's in this PR

  • tunnel/wg6.rs — the allocation module (server_addr, client_v6), unit-tested across shared-/64, short-prefix (per-client /64), and narrow (/124) cases. For a shared /64 the client's /128 host bits are its tunnel IPv4 (stable, allocation-free); ::1 is reserved for the tunnel.
  • WgServer.ipv6 prefix field + WireGuard config generation: the server interface carries <prefix>::1, each peer's AllowedIPs routes its delegated /64 or /128 back over wg, and client configs get their v6 Address, v6 DNS (= the tunnel), and ::/0 for v6.
    • Routing note: IPv6 is full-tunnel (AllowedIPs = …, ::/0). Replies sourced from a VPS-delegated GUA must return through the tunnel, and a plain WireGuard peer can't source-route otherwise. IPv4 stays split.
  • set-ipv6 API + CLI to configure/clear the prefix (+ i18n across 5 locales, TS bindings export).
  • forward chain now accepts client-originated IPv6 (nft_rule_v6); inbound-to-client is deliberately a pinhole (see PR2).

Tested: cargo test -p start-core --features test tunnel::wg → 9 passing (allocation math + rendered configs).

PR1 checklist

  • Dataplane: proxy-NDP for on-link /64s (Hetzner/Vultr) — the VM answers ND for client /128s on the WAN; routed prefixes / per-client /64s need none (6fea6f3, resync_v6). Guarded by a v6_lock (d21dace).
  • AAAA DNS — already served by the existing RFC 2136 injection path (parse_rdata is record-type-generic), so a device can publish an AAAA for its GUA today. Documented; no new code needed.
  • Angular UI: prefix config + Disable on the settings page (6201a62). Each device's v6 appears in its generated config (device show-config); I left it out of the devices list rather than duplicate the allocation math in TS — happy to expose it via the API for the list if you'd prefer a column.
  • Docs (ipv6.md + cli-reference.md + generated manpage), CHANGELOG.md, version 1.1.01.2.0 (c060ce1).
  • Self-review: adversarial review of the full diff → 7 findings fixed (2 major) in d21dace (see comment).
  • End-to-end verification on a real VPS with a routed prefix — needs real global IPv6, so it can't run on a local VM. Worth a manual pass before release.

PR2 (follow-up, 3b)

  • Extend the tunnel's PCP server for IPv6 GUA pinholes (ip6 nft accept, no NAT) — what StartOS's PCP client already requests.
  • Manual pinhole management in the UI/CLI.

Feedback welcome on the allocation scheme and the full-tunnel-v6 AllowedIPs choice.

@helix-nine

Copy link
Copy Markdown
Contributor Author

PR1 is now feature-complete (addressing foundation). Four commits on the branch:

  1. aa0603a — IPv6 address-delegation foundation (wg6 allocation module + WG config generation) — unit-tested.
  2. 6fea6f3 — proxy-NDP dataplane for on-link /64s (Hetzner/Vultr).
  3. c060ce1 — docs (ipv6.md + CLI reference + manpage), CHANGELOG, version 1.1.0 → 1.2.0.
  4. 6201a62 — IPv6 settings UI card (api service live+mock, mock db seed, regenerated TS bindings).

Verification (all pass): cargo test tunnel::wg (9), cargo check -p start-core, check:tunnel, build:tunnel (full Angular template type-check), prettier.

Not automated: end-to-end reachability on a real VPS with a routed IPv6 prefix — needs real global IPv6, which a local libvirt VM can't provide. Worth a manual pass before release.

Deferred to PR2 (per the 3b decision): extend the tunnel's PCP server for IPv6 GUA pinholes (ip6 nft accept, no NAT) + a manual pinhole path.

Still a draft — happy to adjust the allocation scheme or the full-tunnel-v6 AllowedIPs choice on review before I start PR2.

@helix-nine helix-nine marked this pull request as ready for review July 3, 2026 02:07
@helix-nine

Copy link
Copy Markdown
Contributor Author

Before marking this ready I ran an adversarial self-review over the whole PR1 diff (6 dimensions — allocation math, proxy-NDP dataplane, WG config generation, API/CLI, UI, docs — each finding verified against the code). It surfaced 7 real issues, all now fixed in d21dace:

Major

  • resync_v6 did the same non-atomic read→diff→apply→overwrite as resync_egress but without a lock — two concurrent config-change RPCs could leave the tracked proxy-NDP map out of sync with the kernel neighbor table (leaked entries). Added a dedicated v6_lock mirroring egress_lock.
  • The client config advertised an IPv6 DNS server (<prefix>::1) that nothing binds — the DNS proxy is IPv4-only. Dropped the v6 DNS entry (AAAA resolves fine over the v4 resolver); v6 DNS can come with the proxy binding v6 later.

Minor / nit

  • set-ipv6 now also rejects link-local prefixes (a fat-fingered fe80::/64 would have persisted and broken all client v6).
  • UI validator now requires an explicit /prefix in [0,128] (IpNet.parse doesn't enforce it).
  • Docs: corrected the CLI clearing syntax (omit --prefix, not --prefix null); added a note that inbound IPv6 hosting isn't supported yet (PR2); dropped the v6-DNS claim; moved the changelog entry under a ## [1.2.0] heading.

Re-verified after the fixes: cargo test tunnel::wg (9 pass), cargo check, check:tunnel, full build:tunnel, prettier — all green.

@dr-bonez dr-bonez force-pushed the helix/tunnel-ipv6 branch from d21dace to b4a3e4b Compare July 3, 2026 19:09
@dr-bonez dr-bonez closed this Jul 3, 2026
@dr-bonez dr-bonez force-pushed the helix/tunnel-ipv6 branch from 7a16ed2 to 77a97e1 Compare July 3, 2026 23:14
@dr-bonez dr-bonez reopened this Jul 3, 2026
@helix-nine

Copy link
Copy Markdown
Contributor Author

Implemented the per-subnet IPv6 redesign (handoff §6) — 14a07a1. Global WgServer.ipv6 → per-subnet WgSubnetConfig.ipv6; allocation collapsed to one /128 per host (host_v6 = prefix.network() | tunnel_ipv4, server + clients alike); subnet <net> set-ipv6 replaces the global command (carrying the egress validation, keyed on the subnet); UI moved from the settings card to the subnet dialog + a computed IPv6 column in the subnet/device tables.

Ran an adversarial review over the diff before pushing; it caught 4 real issues, all fixed in fd650b2:

  • Library bug (root-caused, not worked around). @start9labs/start-core's IpAddress rendered IPv6 by joining raw decimal octets ("32:1:13:184:…" not "2001:db8::…"), and fromOctets's 16-octet path spun forever on a no-op unshift() whenever the 9th octet was 0 — so zero()/fromOctets/.address on any computed v6 hung the browser or produced garbage. Fixed both the .address getter and fromOctets with one correct renderIpv6 (8 hex groups, longest zero-run → ::). Verified against RFC 5952 edge cases and across all web projects (npm run check) — no consumer regressed.
  • Out-of-prefix addresses. host_v6 only stays in-prefix for prefixes ≤ /96; a /124 escaped the block silently. set_subnet_ipv6 now rejects > /96, the web validator mirrors it, and a wg6 boundary test covers /48–/96.
  • Stale comments from the old /64-delegation model removed.

Verified: cargo test (backend), npm run check (all projects), check:tunnel, full build:tunnel, prettier, and a runtime check of the render fix.

helix-nine and others added 12 commits July 4, 2026 06:25
Add an operator-configured routed IPv6 prefix to the tunnel and carve
per-client global addresses out of it, adapting to the prefix size: a /64
per client when the prefix is shorter than /64 (real prefix delegation),
a single /128 per client for a shared /64 (the common budget-VPS case),
or an indexed /128 for a longer-than-/64 prefix (e.g. DigitalOcean /124).

- wg6: address-allocation module (server_addr + client_v6), unit-tested.
- WgServer.ipv6 prefix field + server/peer/client WireGuard config
  generation carrying v6 addresses, gateway/DNS, and AllowedIPs (v6 is
  full-tunnel so replies from the delegated GUA return through the tunnel).
- set-ipv6 API + CLI to configure/clear the prefix (+ i18n, bindings).
- forward chain: accept client-originated IPv6.

Foundation for the tunnel IPv6 work; runtime dataplane (proxy-NDP, AAAA
DNS), UI, docs, and PCP v6 pinholes follow.
When the configured prefix is a /64 held on-link by a WAN interface
(Hetzner, Vultr), the VPS gateway resolves each client's global address
via Neighbor Discovery on the WAN link. The tunnel host now answers for
those addresses (net.ipv6.conf.all.proxy_ndp + `ip -6 neigh … proxy`) and
forwards to the client over WireGuard. Routed prefixes and per-client /64s
are delivered without ND, so they get no proxy entry.

Reconciled on every network sync and at startup; installed entries are
tracked so stale ones (client removed, prefix cleared) are withdrawn.
Document the new IPv6 delegation: a docs/src/ipv6.md page (linked in
SUMMARY), the `set-ipv6` command in the CLI reference, its generated
manpage, and a CHANGELOG entry. Bump start-tunnel 1.1.0 -> 1.2.0.
Add an IPv6 card to the tunnel settings page: shows the current routed
prefix, validates and saves a new one (`set-ipv6`), and can disable it.
Wires `setIpv6` through the api service (live + mock) and seeds the mock
db with `wg.ipv6`. Regenerates the tunnel TS bindings (SetIpv6Params,
WgServer.ipv6). Also syncs Cargo.lock for the 1.2.0 bump.
- resync_v6: guard with a dedicated v6_lock, mirroring egress_lock, so
  concurrent config changes can't leave the tracked proxy-NDP map out of
  sync with the kernel neighbor table (major).
- client config: stop advertising an IPv6 DNS server (<prefix>::1) — the
  DNS proxy binds IPv4 only, and AAAA resolves fine over it; a dead v6 DNS
  entry caused latency/failures for v6-preferring stub resolvers (major).
- set-ipv6: also reject link-local prefixes (fe80::/10); a fat-fingered
  fe80::/64 would otherwise persist and break all client IPv6.
- UI validator: require an explicit /prefix in [0,128] (IpNet.parse does
  not enforce it, so a bare address slipped through to a backend error).
- docs: correct the CLI clearing syntax (omit --prefix, not "--prefix
  null"); note inbound IPv6 hosting is not yet supported; drop the v6 DNS
  claim; move the changelog entry under a [1.2.0] heading.
…r-PSK persistence

StartOS applied policy routing only to IPv4, so NetworkManager's forced
full-tunnel `::/0` captured the host's entire IPv6 default route into any
imported WireGuard gateway. A tunnel that carried an IPv6 address (e.g. a
StartTunnel with a delegated prefix) but couldn't route IPv6 blackholed
all of the box's IPv6, and a v4-only commercial VPN selected as the
default outbound leaked IPv6 straight out the ISP link.

Mirror the IPv4 policy-routing layer for IPv6 (NAT/reply-routing omitted —
IPv6 has no NAT here):
- wifi.rs: `ip -6 rule` pref 1000/1100 (main/default) above NM's per-tunnel
  `::/0` rules, plus a pref-1200 terminal blackhole so v6 with no usable
  route is dropped instead of falling through to NM's capture.
- apply_policy_routing_v6: populate each managed interface's v6 table
  (`1000 + ifindex`) with main's non-default routes plus a default — a real
  route when the interface can carry v6, else `blackhole default` so a
  non-v6 gateway selected as the default outbound drops v6 (leak guard).
- apply_default_outbound: install the v6 priority-74/75 rules (the desired
  set is family-agnostic, reconciled per family via new snapshot/reconcile
  helpers).
- gc_policy_routing: flush the v6 table for removed interfaces.
A gateway carries the box's IPv6 only when selected as the outbound
gateway, exactly like IPv4 — no hijack, no leak.

Also fix the in-place WireGuard update path (`Update2` + `Reapply`), which
persisted the interface private-key but silently dropped each peer's
preshared-key, so a re-issued PSK-using tunnel failed its handshake and
went dead (taking tunnel-routed DNS with it). Flag the peer PSK
system-owned (`preshared-key-flags = 0`) so Update2 persists it, matching
AddAndActivateConnection on the add path.
A device with an IPv6 assignment routes all its IPv6 full-tunnel
(`AllowedIPs = ::/0`), so a prefix delegated on a server that can't
actually route IPv6 just blackholes the device's IPv6. `set-ipv6` now
hard-errors (leaving the config unchanged) when the server has no IPv6
default route, and logs an actionable warning when the prefix is neither
on-link on a WAN interface nor otherwise verifiable — catching a
misconfigured VPS at set-time instead of on the device. Adds a
`has_ipv6_default_route` helper.
Rename LAN IP/WAN IP -> LAN IPv4/WAN IPv4 and IP Range -> IPv4 Range so the tables read unambiguously once per-subnet IPv6 columns are added.
Drop the single global `WgServer.ipv6` in favor of an optional per-subnet
`WgSubnetConfig.ipv6`, so a server with multiple disjoint IPv6 allocations
can point different subnets at different prefixes. Allocation simplifies to
one `/128` per host with the tunnel IPv4 embedded (`prefix-network | v4`) —
uniform for the server and every client, stable, allocation-free, and
UI-computable. No per-device /64 delegation (StartOS containers use NAT6).

Backend:
- wg6: replace the 3-case client_v6/server_addr/ClientV6 with one host_v6.
- wg/db: remove WgServer.ipv6; add WgSubnetConfig.ipv6 (serde default, no
  migration). Server/peer/client configs source v6 per subnet.
- api: replace the top-level `set-ipv6` with `subnet <net> set-ipv6`,
  carrying the egress + deliverability validation keyed on that subnet's
  prefix. show_config derives the client /128 from its subnet.
- context: resync_v6 iterates per subnet (drops the global running index).
- i18n: about.set-tunnel-ipv6 -> about.set-subnet-ipv6.

Frontend:
- Remove the Settings IPv6 card; add the prefix to the subnet Add/Edit
  dialog + a subnets-table column; setSubnetIpv6 in the api services.
- Devices tables gain an IPv6 column computed in the UI from the subnet's
  prefix + the device's v4 (mirrors host_v6).

Docs/bindings/manpage regenerated for the per-subnet surface.
…ixes

Adversarial review of the per-subnet IPv6 diff surfaced three real issues:

- **Library bug (root cause).** `IpAddress` rendered IPv6 by joining raw
  decimal octets (e.g. "32:1:13:184:…" instead of "2001:db8::…"), and
  `fromOctets`'s 16-octet path spun forever on a no-op `unshift()` when the
  9th octet was 0 — so `zero()`/`fromOctets`/`.address` on any computed v6
  hung the browser or produced garbage. Replace both the `.address` getter
  and `fromOctets` v6 paths with one correct `renderIpv6` (eight hex groups,
  longest zero-run collapsed to `::`). The tunnel devices IPv6 column now
  uses the library directly. Verified across all web projects (npm run
  check) — no consumer regressed.
- **Out-of-prefix addresses.** `host_v6` OR's the 32-bit IPv4 into the low
  bits, which only stays in-prefix for prefixes /96 or shorter; a /124 (or
  any >/96) escaped the delegated block silently. `set_subnet_ipv6` now
  rejects prefixes longer than /96, and the web validator mirrors the bound.
  Added a wg6 boundary test.
- **Stale comments.** Drop the "/64 delegation" / "per-client /64" asides
  left over from the old model in wg.rs and context.rs.
- Clamp the tunnel IPv4 to the prefix's host space in `host_v6` instead of
  rejecting prefixes longer than /96. A /64 keeps the whole IPv4; a smaller
  block (e.g. a /124) keeps only its low host bits, so the address stays
  in-prefix. Drop the >/96 rejection in `set_subnet_ipv6` and the matching
  cap in the web validator; mirror the clamp in the UI's device-IPv6
  computation. A /124 now validates and works.
- Replace the real host/prefix that had crept into tests and docs with
  documentation-range values (RFC 3849 `2001:db8::`, RFC 2606 example.com).

Verified: cargo test (22 tunnel tests, incl. a /124 case and every prefix
length staying in-prefix), UI computation matches host_v6 at runtime,
check:tunnel, build:tunnel, prettier.
Every host on a subnet gets a /128 out of the subnet's prefix with its
tunnel IPv4 clamped to the host space; on a block smaller than the IPv4
that can leave two devices sharing an address. Two devices must never
share one, so enforce uniqueness:

- wg6: `v6_conflict` / `first_v6_collision` helpers (+ unit test).
- add_device: auto-assign skips any IP whose IPv6 collides with the
  server (.1) or an existing device; an explicit colliding IP is rejected
  with a message naming the conflict. All inside the mutate, so atomic.
- set_subnet_ipv6: reject a prefix that can't give every existing host a
  distinct address, checked inside the mutate so a concurrent add can't
  slip a colliding device in between check and write.
- UI getIp: the suggested IP is IPv6-aware, so it never proposes a
  colliding address; a hand-typed one surfaces the backend error.
- docs: note the uniqueness requirement.

A /64 never collides (full IPv4 fits); this only bites on small blocks.
@helix-nine helix-nine force-pushed the helix/tunnel-ipv6 branch from d3535be to 4a3fa57 Compare July 4, 2026 06:28
dr-bonez added 2 commits July 4, 2026 09:17
Mark inbound v6 connections by ingress interface (nft mangle in table ip6 startos), restore the mark on replies, and route via a priority-50 fwmark rule — so a reply to an inbound IPv6 connection that arrived over a tunnel (terminated on the host or DNAT'd to a service container) routes back out that interface. The v6 reply-routing layer was previously omitted, so those replies had no route back and were blackholed: inbound IPv6 over a tunnel was dead.

Remove the terminal pref-1200 v6 blackhole; the v6 default is chosen by metric like v4, and leak prevention stays per-gateway — a v6-incapable gateway selected as the default outbound gets a blackhole default in its own table, reached via the pref-75 catch-all. gc_policy_routing now cleans the pref-50 rule and per-interface table in both families.

Validated live device<->tunnel: host-terminated and DNAT'd-container inbound replies route back; a marked packet routes to its own table authoritatively over NetworkManager's ::/0 capture.
1.1.0 has not shipped, so the per-subnet IPv6 work tagged 1.2.0 belongs in 1.1.0. Collapse the version (Cargo.toml + Cargo.lock) and merge the changelog section. Update the IPv6 entry: device-side inbound hosting over IPv6 now works with StartOS 0.4.0-beta.10.
@dr-bonez dr-bonez merged commit 48c4b0c into master Jul 4, 2026
19 checks passed
@dr-bonez dr-bonez deleted the helix/tunnel-ipv6 branch July 4, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants