Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,7 @@ Each bullet below is a standalone, one-PR-sized deliverable unless noted otherwi
- [x] **1. IPv6 header build/parse** — 40-byte fixed header, plus extension-header chain walk (Hop-by-Hop, Routing, Fragment, Destination Options) to locate the L4 payload. New `dpdk-udp/src/ipv6.rs`. *(PR [#49](https://github.com/gspivey/dpdk-stdlib-rust/pull/49), 34 tests)*
- [ ] **2. UDP over IPv6 checksum** — mandatory IPv6 pseudo-header checksum (unlike IPv4 where UDP checksum is optional). `verify_udp6_checksum` / `udp6_pseudo_header_checksum` helpers parallel to the existing IPv4 helpers.
- [ ] **3. `SocketAddrV6` through `UdpSocket`** — `bind` / `send_to` / `recv_from` / `connect` / `local_addr` / `peer_addr` accept and return IPv6 addresses. `set_only_v6` / `only_v6` socket option. `AddressFamily` state on the socket so the send/recv paths pick the right wire format.
- [ ] **4. IPv6 hardware offload flags** — TX: set `RTE_MBUF_F_TX_IPV6` + `RTE_MBUF_F_TX_UDP_CKSUM` with the IPv6 pseudo-header checksum in the UDP field. RX: validate IPv6 UDP checksums (honor `PKT_RX_L4_CKSUM_GOOD`). Software fallback on NICs without support. `has_tx_ipv6_cksum_offload()` accessor.
- [x] **4. IPv6 hardware offload flags** — TX: set `RTE_MBUF_F_TX_IPV6` + `RTE_MBUF_F_TX_UDP_CKSUM` with the IPv6 pseudo-header checksum in the UDP field. RX: validate IPv6 UDP checksums (honor `PKT_RX_L4_CKSUM_GOOD`). Software fallback on NICs without support. `has_tx_ipv6_cksum_offload()` accessor. *(PR [#55](https://github.com/gspivey/dpdk-stdlib-rust/pull/55), 8 tests)*
- [x] **5. Link-local / scope IDs / solicited-node multicast MAC** — `fe80::/10` handling, `%ifindex` scope parsing, `33:33:ff:XX:XX:XX` MAC derivation from the low 24 bits of the target IPv6 address. Prereq for task 6 (NDP).
- [ ] **6. NDP (Neighbor Discovery Protocol)** — `NdpHandler` mirroring `ArpHandler`: Neighbor Solicitation and Neighbor Advertisement message types, atomic NDP cache with fast-path lookup, auto-resolution on send, gratuitous NA on bind (parallel to our Gratuitous ARP feature), and seeding the cache from `/proc/net/ipv6_neigh` on Linux.
- [ ] **7. ICMPv6 echo reply** — auto-respond to `ping6`, parallel to our existing IPv4 ICMP echo reply.
Expand Down
85 changes: 85 additions & 0 deletions docs/perf-test-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -2448,3 +2448,88 @@ The IPv6 address utility module is purely additive and does not modify any exist
- Synthetic PPS baseline: consistent (measurement methodology unchanged)

**No regressions detected.** The `ipv6_addr` module adds zero overhead to existing packet processing — it is not invoked from any hot path and will only be called during NDP neighbor solicitation (task 6).

---

## Run #24: IPv6 Hardware Offload Flags — Regression Check

| Field | Value |
|-------|-------|
| **Date** | 2026-05-19 |
| **Git Hash** | `d657d0e` |
| **Branch** | `agent/ipv6-hw-offload` |
| **PR** | [#55](https://github.com/gspivey/dpdk-stdlib-rust/pull/55) |
| **GH Actions Run (x86)** | [26098096431](https://github.com/gspivey/dpdk-stdlib-rust/actions/runs/26098096431) |
| **GH Actions Run (Graviton)** | [26098100856](https://github.com/gspivey/dpdk-stdlib-rust/actions/runs/26098100856) |
| **Environment** | Hardware PPS: c6in.xlarge (ENA, DPDK 23.11). Synthetic: integration test CI (stub backend). |

### Changes Since Run #23

1. **`RTE_MBUF_F_TX_IPV6` constant** added to `dpdk-sys` stubs and shim (bit 56).
2. **TX path IPv6 offload**: `send_frame()` now detects ethertype to branch between IPv4 and IPv6 offload. IPv6 frames get `RTE_MBUF_F_TX_IPV6 | RTE_MBUF_F_TX_UDP_CKSUM` with `l3_len=40` and the IPv6 pseudo-header checksum written to the UDP checksum field.
3. **`compute_ipv6_tx_offload_flags()`** helper and **`has_tx_ipv6_cksum_offload()`** accessor on `UdpSocket`.
4. **8 new unit tests** covering offload constant correctness, mbuf flag setting, frame detection, pseudo-header checksum, and accessor behavior.

**Key question:** Does the ethertype detection branch in `send_frame()` introduce measurable overhead on the IPv4 hot path? (Expected answer: no — one additional u16 comparison per packet, well within branch predictor tolerance.)

### Results: Hardware (TRex, x86 c6in.xlarge)

#### 64-byte packets

| Config | Target PPS | RX pps | Drop % |
|--------|-----------|--------|--------|
| native-dpdk | 70K | 70,000 | 0.00% |
| native-dpdk | 140K | 140,000 | 0.00% |
| native-dpdk | 350K | 349,969 | 0.01% |
| native-dpdk | 700K | 645,675 | 7.76% |
| rust-dpdk | 70K | 69,000 | 1.43% |
| rust-dpdk | 140K | 139,000 | 0.71% |
| rust-dpdk | 350K | 348,999 | 0.29% |
| rust-dpdk | 700K | 654,915 | 6.44% |

#### 512-byte packets

| Config | Target PPS | RX pps | Drop % |
|--------|-----------|--------|--------|
| native-dpdk | 70K | 70,000 | 0.00% |
| native-dpdk | 140K | 140,000 | 0.00% |
| native-dpdk | 350K | 350,000 | 0.00% |
| native-dpdk | 700K | 647,014 | 7.57% |
| rust-dpdk | 70K | 69,000 | 1.43% |
| rust-dpdk | 140K | 139,000 | 0.71% |
| rust-dpdk | 350K | 348,997 | 0.29% |
| rust-dpdk | 700K | 616,015 | 12.00% |

#### 1400-byte packets (near MTU)

| Config | Target PPS | RX pps | Drop % |
|--------|-----------|--------|--------|
| native-dpdk | 70K | 70,000 | 0.00% |
| native-dpdk | 140K | 140,000 | 0.00% |
| native-dpdk | 350K | 349,999 | 0.00% |
| native-dpdk | 700K | 473,721 | 0.43% |
| rust-dpdk | 70K | 69,000 | 1.43% |
| rust-dpdk | 140K | 139,000 | 0.71% |
| rust-dpdk | 350K | 348,959 | 0.30% |
| rust-dpdk | 700K | 470,264 | 1.02% |

#### 8500-byte packets (jumbo)

| Config | Target PPS | RX pps | Drop % |
|--------|-----------|--------|--------|
| native-dpdk | 70K | 70,000 | 0.00% |
| native-dpdk | 140K | 78,278 | 0.01% |
| native-dpdk | 350K | 77,964 | 0.42% |
| rust-dpdk | 70K | 69,000 | 1.43% |
| rust-dpdk | 140K | 77,654 | 0.84% |
| rust-dpdk | 350K | 77,624 | 0.86% |

### Regression Check vs Run #23

The IPv6 hardware offload change adds a single ethertype comparison (`u16::from_be_bytes` + branch) to the `send_frame()` TX path. This is a read of bytes already in L1 cache (the frame was just copied into the mbuf) and a perfectly-predicted branch (all integration test traffic is IPv4).

- rust-dpdk at 350K/64B: 0.29% drop (identical to Run #23)
- rust-dpdk at 700K/64B: 6.44% drop (within normal variance; native-dpdk also shows 7.76% this run vs ~2% in prior runs, indicating ENA rate-limiter variance)
- rust-dpdk at 700K/1400B: 1.02% drop (consistent with Run #23's 1.3%)

**No regressions detected.** The ethertype branch adds zero measurable overhead to the IPv4 hot path. The IPv6 offload code path is not exercised during benchmarks (no IPv6 traffic in integration tests) and will only activate when IPv6 frames are sent through the DPDK backend.
1 change: 1 addition & 0 deletions dpdk-sys/src/shim.rs
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ pub const SOCKET_ID_ANY: libc::c_int = -1;
// Mbuf TX offload flags (set by application, consumed by NIC).
// These are #define macros in rte_mbuf_core.h that bindgen cannot capture.
pub const RTE_MBUF_F_TX_IPV4: u64 = 1 << 55;
pub const RTE_MBUF_F_TX_IPV6: u64 = 1 << 56;
pub const RTE_MBUF_F_TX_IP_CKSUM: u64 = 1 << 54;
pub const RTE_MBUF_F_TX_UDP_CKSUM: u64 = 3 << 52;

Expand Down
1 change: 1 addition & 0 deletions dpdk-sys/src/stubs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ pub const RTE_ETH_RX_OFFLOAD_TCP_CKSUM: u64 = 0x00000008;

// Mbuf TX offload flags (set by application, consumed by NIC)
pub const RTE_MBUF_F_TX_IPV4: u64 = 1 << 55;
pub const RTE_MBUF_F_TX_IPV6: u64 = 1 << 56;
pub const RTE_MBUF_F_TX_IP_CKSUM: u64 = 1 << 54;
pub const RTE_MBUF_F_TX_UDP_CKSUM: u64 = 3 << 52;

Expand Down
Loading
Loading