Skip to content

controller: IS-IS adjacency failure caused by duplicate parent interface rendering with inconsistent MTU #3386

@elitegreg

Description

@elitegreg

Summary

The IS-IS adjacency between nyc-dz001 and lax-dz001 on Switch1/1/1.1002 fails to establish on testnet. The root cause is two interacting bugs in the controller's config rendering: (1) the parent interface Switch1/1/1 is rendered multiple times when a device has multiple subinterfaces, and (2) the Go SDK's V1 interface deserializer does not default the MTU field, causing different parent entries to render with different MTU values depending on the onchain interface version.

Observed behavior

The controller generates configs for nyc-dz001 and lax-dz001 with duplicate interface Switch1/1/1 blocks, each with a different MTU. On Arista EOS, the last block wins, and because the subinterfaces sort in different order on each device, the effective MTU differs between the two sides.

NYC (subinterfaces: .1000, .1002 — last parent wins with mtu 1500):

interface Switch1/1/1
   mtu 2048
   no switchport
!
interface Switch1/1/1
   mtu 1500
   no switchport

LAX (subinterfaces: .1002, .1003 — last parent wins with mtu 2048):

interface Switch1/1/1
   mtu 1500
   no switchport
!
interface Switch1/1/1
   mtu 2048
   no switchport

Both sides have isis hello padding enabled on the subinterfaces, which pads IS-IS hellos to the interface MTU. The parent interface MTU caps what can actually transit the wire, so mismatched parent MTUs between the two sides can prevent IS-IS hello exchange and block adjacency formation.

Bug 1: Duplicate parent interface rendering

Location: controlplane/controller/internal/controller/server.go:174-180

if intf.IsSubInterface {
    parent, err := intf.GetParent()
    if err != nil {
        c.log.Error(...)
        continue
    }
    d.Interfaces = append(d.Interfaces, parent)
}

processDeviceInterfacesAndPeers iterates all onchain interfaces for a device. For every subinterface, it calls GetParent() and appends a new parent interface to the device's interface list. When a device has multiple subinterfaces on the same physical port (e.g., Switch1/1/1.1000 and Switch1/1/1.1002), the parent Switch1/1/1 is appended multiple times.

The template (templates/tunnel.tmpl:87-128) iterates Device.Interfaces and renders each entry, producing duplicate interface Switch1/1/1 blocks in the config output.

Suggested fix: Deduplicate parent interfaces — only append a parent if one hasn't already been created for that interface name. Other approaches (e.g., a static MTU of 2048 for all parents) could also work.

Bug 2: Go SDK V1 interface deserializer does not default MTU

Location: smartcontract/sdk/go/serviceability/deserialize.go:95-104

func DeserializeInterfaceV1(reader *ByteReader, iface *Interface) {
    iface.Status = InterfaceStatus(reader.ReadU8())
    iface.Name = reader.ReadString()
    iface.InterfaceType = InterfaceType(reader.ReadU8())
    iface.LoopbackType = LoopbackType(reader.ReadU8())
    iface.VlanId = reader.ReadU16()
    iface.IpNet = reader.ReadNetworkV4()
    iface.NodeSegmentIdx = reader.ReadU16()
    iface.UserTunnelEndpoint = (reader.ReadU8() != 0)
}

The V1 interface schema did not include an MTU field. The V1 deserializer does not set iface.Mtu, so it remains at Go's zero-value (0). The Rust SDK, by contrast, migrates V1 interfaces to V2 with mtu: 1500 (in smartcontract/programs/doublezero-serviceability/src/state/interface.rs:362), so the doublezero CLI shows 1500 for all interfaces — masking the fact that the Go SDK sees 0 for V1 interfaces.

This only affects parent interfaces, because:

  • Subinterface MTU is overridden by the link MTU (9000) during link processing (server.go:395-397), which runs after parent creation
  • Parent interfaces are created by GetParent() (models.go:146-162), which copies the subinterface's MTU at the time of creation — before link MTU override
  • Parents don't match any onchain link (links reference subinterface names like Switch1/1/1.1002, not Switch1/1/1), so they never receive the link MTU override

The result: a parent created from a V1 subinterface gets Mtu: 0, which the template renders as mtu 2048 (the default). A parent created from a V2 subinterface gets the MTU from the onchain device interface definition (e.g., 1500).

Suggested fix: Set iface.Mtu = 1500 in DeserializeInterfaceV1 to match the Rust SDK's V1→V2 migration behavior. Alternatively, the controller could use a static MTU for parent interfaces regardless of the onchain value.

How V1 to V2 promotion works

Interfaces are stored onchain with a version discriminant. When an interface is first created, it may be stored as V1 (which lacks fields like MTU, bandwidth, CIR, routing mode, CYOA, and DIA). When the interface is later updated via an onchain instruction, it gets re-serialized as V2 with the new fields populated.

The Rust SDK handles this transparently — its TryFrom<&InterfaceV1> for InterfaceV2 impl defaults mtu: 1500. But the Go SDK's DeserializeInterfaceV1 simply skips the fields that don't exist in V1, leaving them at zero-values. This means updating any interface onchain (promoting it from V1 to V2) changes its MTU in the Go SDK from 0 to whatever value is set — causing the rendered parent interface MTU to change from 2048 (template default for 0) to the actual onchain value.

Why the effective MTU flips between devices

After parent creation, the interfaces are sorted alphabetically (server.go:184-186). Multiple parents for the same name sort adjacently, but their relative order depends on which subinterface was processed first. On the device, the last interface Switch1/1/1 block wins.

  • NYC has .1000 (V2, Mtu=1500) and .1002 (V1, Mtu=0). Parents sort as: Switch1/1/1 (Mtu=1500), Switch1/1/1 (Mtu=0). Last wins → effective mtu 2048.
  • LAX has .1002 (V1, Mtu=0) and .1003 (V2, Mtu=1500). Parents sort as: Switch1/1/1 (Mtu=0), Switch1/1/1 (Mtu=1500). Last wins → effective mtu 1500.

(The exact V1/V2 assignment per interface may vary; the point is that mixed versions produce different parent MTUs, and sort order determines which one wins on each device.)

Impact

  • IS-IS adjacency between nyc-dz001 and lax-dz001 on Switch1/1/1.1002 fails to establish
  • Any device with multiple WAN subinterfaces on the same physical port is affected
  • The problem is non-obvious because the doublezero CLI (Rust SDK) shows mtu 1500 for all interfaces, hiding the V1/V2 distinction

Affected links

Link Side A Side Z Status
lax-dz001:nyc-dz001 Switch1/1/1.1002 Switch1/1/1.1002 IS-IS down

Reproduction

Any device with 2+ subinterfaces on the same physical port where at least one subinterface is still stored as V1 onchain will produce duplicate parent interface blocks with mismatched MTUs. Even with all V2 interfaces, the duplication bug still produces incorrect configs if the onchain interface MTU is not 2048 (the template default), since the parent would be rendered twice with the same non-default value instead of once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions