-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Summary
The IS-IS adjacency between nyc-dz001 and lax-dz001 on Switch1/1/1.1002 fails to establish on testnet. The root cause is two interacting bugs in the controller's config rendering: (1) the parent interface Switch1/1/1 is rendered multiple times when a device has multiple subinterfaces, and (2) the Go SDK's V1 interface deserializer does not default the MTU field, causing different parent entries to render with different MTU values depending on the onchain interface version.
Observed behavior
The controller generates configs for nyc-dz001 and lax-dz001 with duplicate interface Switch1/1/1 blocks, each with a different MTU. On Arista EOS, the last block wins, and because the subinterfaces sort in different order on each device, the effective MTU differs between the two sides.
NYC (subinterfaces: .1000, .1002 — last parent wins with mtu 1500):
interface Switch1/1/1
mtu 2048
no switchport
!
interface Switch1/1/1
mtu 1500
no switchport
LAX (subinterfaces: .1002, .1003 — last parent wins with mtu 2048):
interface Switch1/1/1
mtu 1500
no switchport
!
interface Switch1/1/1
mtu 2048
no switchport
Both sides have isis hello padding enabled on the subinterfaces, which pads IS-IS hellos to the interface MTU. The parent interface MTU caps what can actually transit the wire, so mismatched parent MTUs between the two sides can prevent IS-IS hello exchange and block adjacency formation.
Bug 1: Duplicate parent interface rendering
Location: controlplane/controller/internal/controller/server.go:174-180
if intf.IsSubInterface {
parent, err := intf.GetParent()
if err != nil {
c.log.Error(...)
continue
}
d.Interfaces = append(d.Interfaces, parent)
}processDeviceInterfacesAndPeers iterates all onchain interfaces for a device. For every subinterface, it calls GetParent() and appends a new parent interface to the device's interface list. When a device has multiple subinterfaces on the same physical port (e.g., Switch1/1/1.1000 and Switch1/1/1.1002), the parent Switch1/1/1 is appended multiple times.
The template (templates/tunnel.tmpl:87-128) iterates Device.Interfaces and renders each entry, producing duplicate interface Switch1/1/1 blocks in the config output.
Suggested fix: Deduplicate parent interfaces — only append a parent if one hasn't already been created for that interface name. Other approaches (e.g., a static MTU of 2048 for all parents) could also work.
Bug 2: Go SDK V1 interface deserializer does not default MTU
Location: smartcontract/sdk/go/serviceability/deserialize.go:95-104
func DeserializeInterfaceV1(reader *ByteReader, iface *Interface) {
iface.Status = InterfaceStatus(reader.ReadU8())
iface.Name = reader.ReadString()
iface.InterfaceType = InterfaceType(reader.ReadU8())
iface.LoopbackType = LoopbackType(reader.ReadU8())
iface.VlanId = reader.ReadU16()
iface.IpNet = reader.ReadNetworkV4()
iface.NodeSegmentIdx = reader.ReadU16()
iface.UserTunnelEndpoint = (reader.ReadU8() != 0)
}The V1 interface schema did not include an MTU field. The V1 deserializer does not set iface.Mtu, so it remains at Go's zero-value (0). The Rust SDK, by contrast, migrates V1 interfaces to V2 with mtu: 1500 (in smartcontract/programs/doublezero-serviceability/src/state/interface.rs:362), so the doublezero CLI shows 1500 for all interfaces — masking the fact that the Go SDK sees 0 for V1 interfaces.
This only affects parent interfaces, because:
- Subinterface MTU is overridden by the link MTU (
9000) during link processing (server.go:395-397), which runs after parent creation - Parent interfaces are created by
GetParent()(models.go:146-162), which copies the subinterface's MTU at the time of creation — before link MTU override - Parents don't match any onchain link (links reference subinterface names like
Switch1/1/1.1002, notSwitch1/1/1), so they never receive the link MTU override
The result: a parent created from a V1 subinterface gets Mtu: 0, which the template renders as mtu 2048 (the default). A parent created from a V2 subinterface gets the MTU from the onchain device interface definition (e.g., 1500).
Suggested fix: Set iface.Mtu = 1500 in DeserializeInterfaceV1 to match the Rust SDK's V1→V2 migration behavior. Alternatively, the controller could use a static MTU for parent interfaces regardless of the onchain value.
How V1 to V2 promotion works
Interfaces are stored onchain with a version discriminant. When an interface is first created, it may be stored as V1 (which lacks fields like MTU, bandwidth, CIR, routing mode, CYOA, and DIA). When the interface is later updated via an onchain instruction, it gets re-serialized as V2 with the new fields populated.
The Rust SDK handles this transparently — its TryFrom<&InterfaceV1> for InterfaceV2 impl defaults mtu: 1500. But the Go SDK's DeserializeInterfaceV1 simply skips the fields that don't exist in V1, leaving them at zero-values. This means updating any interface onchain (promoting it from V1 to V2) changes its MTU in the Go SDK from 0 to whatever value is set — causing the rendered parent interface MTU to change from 2048 (template default for 0) to the actual onchain value.
Why the effective MTU flips between devices
After parent creation, the interfaces are sorted alphabetically (server.go:184-186). Multiple parents for the same name sort adjacently, but their relative order depends on which subinterface was processed first. On the device, the last interface Switch1/1/1 block wins.
- NYC has
.1000(V2, Mtu=1500) and.1002(V1, Mtu=0). Parents sort as:Switch1/1/1(Mtu=1500),Switch1/1/1(Mtu=0). Last wins → effectivemtu 2048. - LAX has
.1002(V1, Mtu=0) and.1003(V2, Mtu=1500). Parents sort as:Switch1/1/1(Mtu=0),Switch1/1/1(Mtu=1500). Last wins → effectivemtu 1500.
(The exact V1/V2 assignment per interface may vary; the point is that mixed versions produce different parent MTUs, and sort order determines which one wins on each device.)
Impact
- IS-IS adjacency between
nyc-dz001andlax-dz001onSwitch1/1/1.1002fails to establish - Any device with multiple WAN subinterfaces on the same physical port is affected
- The problem is non-obvious because the
doublezeroCLI (Rust SDK) showsmtu 1500for all interfaces, hiding the V1/V2 distinction
Affected links
| Link | Side A | Side Z | Status |
|---|---|---|---|
lax-dz001:nyc-dz001 |
Switch1/1/1.1002 |
Switch1/1/1.1002 |
IS-IS down |
Reproduction
Any device with 2+ subinterfaces on the same physical port where at least one subinterface is still stored as V1 onchain will produce duplicate parent interface blocks with mismatched MTUs. Even with all V2 interfaces, the duplication bug still produces incorrect configs if the onchain interface MTU is not 2048 (the template default), since the parent would be rendered twice with the same non-default value instead of once.