Skip to content

[EVPN-MH] Integrate EVPN-MH with existing modules and add tests#4615

Open
tahmed-dev wants to merge 14 commits into
sonic-net:masterfrom
tahmed-dev:tahmed/evpn-mh-2-integrate-20260527
Open

[EVPN-MH] Integrate EVPN-MH with existing modules and add tests#4615
tahmed-dev wants to merge 14 commits into
sonic-net:masterfrom
tahmed-dev:tahmed/evpn-mh-2-integrate-20260527

Conversation

@tahmed-dev
Copy link
Copy Markdown
Contributor

@tahmed-dev tahmed-dev commented May 27, 2026

Why I did it

This is part 2/2 of splitting the original EVPN-MH PR (#4262) into reviewable pieces.
Part 1 (#4608) landed the standalone EVPN-MH code (new orchs and headers added without touching existing files).
This PR integrates EVPN-MH with the existing modules (NeighOrch, FdbOrch, VxlanOrch, RouteOrch, MuxOrch, p4orch, fpmsyncd, etc.) and adds the mock/VS tests that exercise both the standalone and integration paths.

The test-infra prerequisites required to land these tests were merged separately as #4599.

How I did it

  • Commit 1 — Integrate EVPN-MH with existing modules: wires the standalone EVPN-MH orchs (added in [EVPN-MH] Add standalone EVPN-MH code and tests #4608) into the existing data plane (FDB / neighbor / next-hop / VXLAN / mux / p4orch / fpmsyncd).
  • Commit 2 — Add EVPN-MH tests and stabilization updates: adds new mock_tests (neighorch, vxlanorch, p4orch fake/mock helpers) and VS tests (test_sag.py, EVPN FDB/L3 VXLAN updates) and a few small stabilization tweaks needed for the new tests.

How to verify it

  • make the swss debs and run mock_tests; test_sag.py and the new EVPN-MH cases under tests/mock_tests/{neighorch,vxlanorch}_ut.cpp should pass.
  • Full VS suite locally and in CI.

Which release branch to backport (provide reason below if selected)

  • 202012
  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Description for the changelog

[EVPN-MH] Integrate EVPN-MH with existing modules and add unit/VS tests.

Link to config_db schema for YANG module changes

N/A (YANG changes are tracked separately in sonic-buildimage PR #27543.)

A picture of a cute animal (not mandatory but encouraged)

Copilot AI review requested due to automatic review settings May 27, 2026 21:35
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR expands EVPN/VXLAN and MUX test robustness, introduces Static Anycast Gateway (SAG) validation, and wires additional EVPN MH (DF/SHL/backup NHG) plumbing across orchagent, cfgmgr, and sync daemons.

Changes:

  • Added SAG functional tests and SAG handling across IntfMgr/IntfsOrch, including link-local route programming tied to SAG MAC.
  • Improved test stability by replacing fixed sleeps with state-based waits and adding ASIC state drain fixtures.
  • Added EVPN MH / split-horizon / backup NHG support across fpmsyncd, portsorch, route/VRF orchestration, and mock/unit-test infrastructure.

Reviewed changes

Copilot reviewed 73 out of 73 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_sag.py New functional tests covering SAG MAC set/unset, VLAN enable/disable, VRF behavior, and ASIC/kernel verification.
tests/test_mux_prefixroute.py Reduces flakiness via wait-based FDB sync, neighbor drain waits, and ASIC cleanup fixture.
tests/test_inband_intf_mgmt_vrf.py Simplifies VRF table empty wait helper to a direct implementation.
tests/test_evpn_l3_vxlan_p2mp.py Refreshes baseline entries prior to VLAN interface creation to avoid cross-test pollution.
tests/test_evpn_l3_vxlan.py Same baseline refresh prior to VLAN interface creation for determinism.
tests/test_evpn_fdb_p2mp.py Adjusts synthetic FDB notification event type to MOVE.
tests/test_evpn_fdb.py Adjusts synthetic FDB notification event type to MOVE.
tests/mock_tests/vxlanorch_ut.cpp Expands VXLAN orch UT scaffolding and adds multiple new VXLAN unit tests.
tests/mock_tests/routeorch_ut.cpp Adds EvpnMhOrch + IntfsOrch table vector wiring for mock route tests.
tests/mock_tests/qosorch_ut.cpp Updates IntfsOrch construction for multi-table input (INTF + SAG).
tests/mock_tests/portsorch_ut.cpp Adds EvpnMhOrch construction and updates IntfsOrch table wiring.
tests/mock_tests/neighorch_ut.cpp Adds unit tests validating NeighOrch FDB add/delete/resolve behaviors.
tests/mock_tests/mux_rollback_ut.cpp Adds missing gmock matcher import used by tests.
tests/mock_tests/mock_sai_tunnel.h Converts mock SAI tunnel wrappers to extern declarations (implementation moved to .cpp).
tests/mock_tests/mock_orchagent_main.h Adds EvpnMhOrch and L2NhgOrch globals for mock orchagent.
tests/mock_tests/mock_orchagent_main.cpp Adds global SAG MAC variable initialization for mock orchagent.
tests/mock_tests/mock_orch_test.cpp Ensures EvpnMhOrch is created early; updates IntfsOrch ctor signature usage.
tests/mock_tests/mock_hiredis.cpp Implements redisFree() to close fd/free allocations in mock hiredis context.
tests/mock_tests/intfsorch_ut.cpp Updates IntfsOrch construction to accept table vector (INTF + SAG).
tests/mock_tests/fpmsyncd/ut_helpers_fpmsyncd.cpp Extends mocked ifindex→name mapping for EVPN MH tests.
tests/mock_tests/flowcounterrouteorch_ut.cpp Updates IntfsOrch construction to accept table vector (INTF + SAG).
tests/mock_tests/flexcounter_ut.cpp Updates IntfsOrch construction to accept table vector (INTF + SAG).
tests/mock_tests/fdborch/flush_syncd_notif_ut.cpp Adds missing orch dependencies; updates remote FDB data model expectations.
tests/mock_tests/check.h Fixes signed/unsigned compare by casting act_len/exp_len to size_t.
tests/mock_tests/bufferorch_ut.cpp Updates IntfsOrch construction to accept table vector (INTF + SAG).
tests/mock_tests/buffermgrdyn_ut.cpp Ensures WarmStart flag is reset in TearDown to avoid cross-test impact.
tests/mock_tests/aclorch_ut.cpp Updates IntfsOrch construction to accept table vector (INTF + SAG).
tests/mock_tests/Makefile.am Adds new test binaries/sources (fdbsyncd UTs, EVPN MH, SHL, mock SAI files, helpers).
run-gtest-suite.py Changes RLIMIT_NOFILE adjustment logic (now unconditionally imports resource).
orchagent/vxlanorch.h Adds DIP cleanup APIs (cleanupDynamicDIPTunnel/eraseRemoteEndPoint).
orchagent/vxlanorch.cpp Adjusts DIP endpoint lifecycle, improves logging, and fixes VNI L2/L3 mapping ordering.
orchagent/vrforch.h Adds VRF ID enumeration helper and markVniAsL3 helper.
orchagent/vrforch.cpp Moves link-local route management to RouteOrch via Directory lookup; adds per-VRF EUI64 /128 CPU route.
orchagent/routeorch.h Changes getLinkLocalEui64Addr() to accept a MAC parameter.
orchagent/routeorch.cpp Updates link-local route creation to use the passed MAC rather than global-only.
orchagent/request_parser.cpp Broadens colon-joined key parsing to support REQ_T_STRING final key item types.
orchagent/portsorch.h Adds VLAN member lookup and L2 NHG add/remove APIs.
orchagent/portsorch.cpp Adds VLAN member lookup, L2 NHG bridge-port creation, EVPN DF attributes, and tunnel oper status updates.
orchagent/port.h Adds Port::NEXTHOP_GROUP type and storage for next-hop group OID.
orchagent/p4orch/tests/test_main.cpp Switches to mock RouteOrch, wires global gMacAddress and creates a RouteOrch instance for tests.
orchagent/p4orch/tests/mock_sai_tunnel.h Adds extern mock_sai_tunnel pointer and formatting fixes.
orchagent/p4orch/tests/mock_sai_tunnel.cpp Formatting + style fixes; ensures mock_sai_tunnel global is defined.
orchagent/p4orch/tests/fake_routeorch.cpp Replaces stub RouteOrch with a more complete mock implementation for newer interfaces.
orchagent/p4orch/tests/Makefile.am Reorders mock_sai_tunnel.cpp in sources list (no functional change).
orchagent/orchdaemon.h Adds EvpnMhOrch/L2NhgOrch/ShlOrch includes.
orchagent/orchdaemon.cpp Creates EvpnMhOrch earlier, introduces L2NhgOrch/ShlOrch, updates orch init ordering.
orchagent/neighorch.h Adds deletion_pending flag and FDB event handlers + VRF-aware getNeighborEntry overload.
orchagent/neighorch.cpp Implements FDB add/delete/resolve handlers; adjusts neighbor programming logic and deletion notify ordering.
orchagent/muxorch.cpp Makes route creation accept const prefix; adds item-not-found create fallback and improves prefix-route deletion logic.
orchagent/intfsorch.h Changes ctor to accept multi-table list; adds SAG state fields and SAG route helpers.
orchagent/intfsorch.cpp Adds SAG table handling, per-interface MAC tracking, and link-local route programming tied to SAG enablement.
orchagent/fdborch.h Extends remote FDB destination model + adds flushAllFDBEntries API and per-port cache structures.
orchagent/Makefile.am Adds new orchagent sources: evpnmhorch.cpp, l2nhgorch.cpp, shlorch.cpp.
neighsyncd/neighsyncd.cpp Updates NeighSync ctor to include APP_DB connector.
neighsyncd/neighsync.h Adds route table + netlink cache/socket members and interface-name helper.
neighsyncd/neighsync.cpp Adds netlink cache/socket lifecycle, VRF master lookup, and host route deletion before neighbor add.
fpmsyncd/routesync.h Adds EVPN MH PS tables + raw handlers (SHL/DF/backup NHG/TC filter) and tuple wrapper includeEmptyFields.
fpmsyncd/routesync.cpp Adds EVPN MH parsing (via/tc filter/FPM private types), adjusts VXLAN encap parsing, adds RouteSync destructor.
fpmsyncd/fpmsyncd.cpp Registers RTM_NEWTFILTER/RTM_DELTFILTER with NetDispatcher.
fpmsyncd/fpmlink.cpp Treats TFILTER + RTM_FPM private types as raw processing; null-guards RouteSync callbacks.
fpmsyncd/fpm/fpm.h Defines FPM private RTM_FPM_* message types and EVPN MH message structs/attributes.
fdbsyncd/fdbsyncd.cpp Moves neigh/nexthop handling to raw netlink handlers; adds RTNLGRP_NEXTHOP dump/subscribe.
fdbsyncd/fdbsync.h Extends FdbSync interface for raw messages, L2 NHG cache, and richer destination typing.
cfgmgr/vxlanmgr.h Splits ProducerStateTable vs Table for APP_VXLAN_TUNNEL updates.
cfgmgr/vxlanmgr.cpp Uses ProducerStateTable for tunnel updates; adds udp6zerocsumrx behavior for IPv6 VTEPs; delays NVO until tunnel exists.
cfgmgr/teammgr.h Adds system MAC setter and kernel update helper for LAG.
cfgmgr/teammgr.cpp Implements kernel MAC update via libnl and propagates system_mac to APP/STATE DB.
cfgmgr/intfmgrd.cpp Adds CFG_SAG_TABLE_NAME subscription and initializes global MAC variables from DEVICE_METADATA.
cfgmgr/intfmgr.h Adds SAG producers/tables and helpers for SAG FDB + interface state changes.
cfgmgr/intfmgr.cpp Implements SAG global MAC + VLAN per-interface SAG MAC switching, including FDB programming and APP_DB SAG updates.
Comments suppressed due to low confidence (4)

run-gtest-suite.py:1

  • import resource is now unconditional at module import time, which will raise ImportError on platforms without the POSIX resource module (the comment explicitly mentions Windows). To keep the script portable, remove the unconditional import and keep the guarded try/except ImportError import (or import inside the try block only), then gate the RLIMIT_NOFILE logic on whether the import succeeded.
    orchagent/portsorch.cpp:1
  • This change introduces a call to FdbOrch::flushAllFDBEntries(...). In this PR diff set, only the declaration is added (in orchagent/fdborch.h), but there is no corresponding implementation shown (e.g., in orchagent/fdborch.cpp). If the implementation is missing, this will fail to link. Add/verify the concrete implementation and ensure it correctly flushes both dynamic and static entries for the given bridge port.
    orchagent/request_parser.cpp:1
  • The logic that follows this condition is described (in the nearby comment) as assembling removed items into an IPv6 address. After extending the condition to REQ_T_STRING, that comment is no longer accurate. Update the comment to reflect that the code is now also used to re-join arbitrary string key components that may include ':' (not just IPv6 parsing).
    orchagent/vxlanorch.h:1
  • The newly added APIs (and a couple of existing ones) take std::string by value, which causes unnecessary copies on hot paths and is inconsistent with other APIs in this area. Prefer const std::string& remote_vtep (and similarly for dip) unless ownership transfer is intended.

Comment thread cfgmgr/teammgr.cpp Outdated
Comment thread orchagent/fdborch.h Outdated
Comment thread fpmsyncd/routesync.cpp
tahmed-dev added a commit to tahmed-dev/sonic-swss that referenced this pull request May 27, 2026
- teammgr: pass 0 (flags bitmask) to rtnl_link_change() instead of
  ifindex. The 4th arg is netlink flags, not the interface index;
  the target link is already identified by orig_link.
- fdborch.h, fdbsyncd/fdbsync.h: convert unscoped enum NEXT_HOP_VALUE_TYPE
  to scoped 'enum class FdbDest : uint8_t' to avoid global namespace
  pollution and enforce explicit qualification at call sites.
- fpmsyncd/routesync: guard ~RouteSync() against null m_nl_sock and
  free m_link_cache (previously leaked).

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor Author

@tahmed-dev tahmed-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny @pbrisset — flagging reviewer focus areas where this PR alters existing code paths (vs. purely additive code). Per-file/line comments below.

Comment thread orchagent/orchdaemon.cpp
MuxOrch *gMuxOrch;
IcmpOrch *gIcmpOrch;
HFTelOrch *gHFTOrch;
ShlOrch *gShlOrch;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init order changed: EvpnMhOrch constructed before FdbOrch and inserted between gPortsOrch and gBufferOrch in m_orchList; L2NhgOrch appended at end; gShlOrch appended after VXLAN orchs; IntfsOrch ctor signature changed to add APP_SAG_TABLE_NAME. Please verify warm-boot replay order and that ES/DF state is available before FdbOrch/PortsOrch consume it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to subdue this concern by constructing EVPN-MH before FDB processing and placing it immediately after PortsOrch in the warm-restart list, so ES/DF state is available before FdbOrch consumes FDB work. L2 NHG remains in the orch list and the SAG table is added at the same IntfsOrch priority as INTF_TABLE.

// Create EvpnMhOrch early so its ES/DF state is available when PortsOrch
// processes bridge ports and VLAN members (fixes warm boot ordering)
gEvpnMhOrch = new EvpnMhOrch(evpn_df_es_table_connectors);

m_orchList = { gSwitchOrch, gCrmOrch, gPortsOrch, gEvpnMhOrch,
               gBufferOrch, gFlowCounterRouteOrch, gIntfsOrch,
               gNeighOrch, /* ... */, gStpOrch, gL2NhgOrch };

vector<table_name_with_pri_t> intf_tables = {
    { APP_INTF_TABLE_NAME,  IntfsOrch::intfsorch_pri},
    { APP_SAG_TABLE_NAME,   IntfsOrch::intfsorch_pri}
};

Comment thread orchagent/fdborch.cpp
#include "sai_serialize.h"
#include "mlagorch.h"
#include "vxlanorch.h"
#include "l2nhgorch.h"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDB state machine changes: storeFdbEntryState() on MAC-move now explicitly m_entries.erase(entry) before re-insert; remote_ip replaced by dest_type/dest_value (IFNAME). clearFdbEntry() signature changed to (entry, fdbData); per-port FDB counter decrement is now conditional on getPortByBridgePortId() (previously unconditional). handleSyncdFlushNotif() has three hunks changing bv_id/bridge_port_id resolution during flush. Please double-check counter-decrement parity in every pre-existing path and that flush behavior is unchanged for non-EVPN FDB.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep the legacy FDB paths idempotent while allowing the EVPN-MH destination metadata. The branch trace confirms the existing non-EVPN flush path still requires the same bv_id and bridge_port_id match before clearing, and the removal path keeps counter updates tied to resolved bridge ports instead of touching unknown ports.

if (curr->first.bv_id == bv_id && curr->second.bridge_port_id == bridge_port_id)
{
    if (curr->second.sai_fdb_type == sai_fdb_type &&
        (curr->first.mac == mac || mac == flush_mac) && curr->second.is_flush_pending)
    {
        clearFdbEntry(curr->first, curr->second);
    }
}

if (bridge_port_id &&
    !m_portsOrch->getPortByBridgePortId(bridge_port_id, update.port))
{
    if (type == SAI_FDB_EVENT_FLUSHED)
    {
        return;
    }
}

Comment thread orchagent/neighorch.cpp
return;
}

/**
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neighbor add/remove altered for ES/DF: addNeighbor() integrates ES/DF check (3 hunks) and may now refuse to program a neighbor that previously would have been programmed; removeNeighbor() gains a disable bool parameter (all call sites updated); both getNeighborEntry(NextHopKey) and getNeighborEntry(IpAddress) modified. Please check legacy callers' expectations when a neighbor is 'disabled but not removed' and that isHwConfigured() is cleared correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep the disabled-neighbor state explicit instead of deleting the cache entry. The disable path returns early for neighbors that are already absent from hardware, and removeNeighbor(..., true) clears hw_configured while preserving the cache entry for later re-enable.

if (!isHwConfigured(neighborEntry))
{
    SWSS_LOG_INFO("Neighbor %s is not programmed to HW", neighborEntry.ip_address.to_string().c_str());
    return true;
}

return removeNeighbor(ctx, true);

/* Do not delete entry from cache if its disable request */
if (disable)
{
    m_syncdNeighbors[neighborEntry].hw_configured = false;
    return true;
}

Comment thread orchagent/muxorch.cpp Outdated
}

static sai_status_t create_route(IpPrefix &pfx, sai_object_id_t nh)
static sai_status_t create_route(const IpPrefix &pfx, sai_object_id_t nh)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUX neighbor update touched by EVPN-MH: set_route() adds an extra branch; MuxNbrHandler::update() and MuxPrefixBasedNbrHandler::update() now factor EVPN-MH into SAI next-hop / tunnel_id during active/standby transitions. Please confirm standby→active and prefix-route fast-path are bit-identical when no ES is configured.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep the non-EVPN MUX fast path explicit. The set-route fallback only creates a route when the prefix-route path sees SAI_STATUS_ITEM_NOT_FOUND; otherwise the existing set failure handling is unchanged. Delete handling removes only tracked prefix routes, or preserves the prior standby cleanup path.

sai_status_t status = sai_route_api->set_route_entry_attribute(&route_entry, &route_attr);
if (status == SAI_STATUS_ITEM_NOT_FOUND && mux_prefix_route)
{
    return create_route(pfx, next_hop_id);
}

if (neighbors_.find(nh.ip_address) != neighbors_.end() || state == MuxState::MUX_STATE_STANDBY)
{
    remove_route(pfx);
}

if (state == MuxState::MUX_STATE_STANDBY)
{
    updateTunnelRoute(nh, false);
}

Comment thread fpmsyncd/routesync.cpp
#include "producerstatetable.h"
#include "fpmsyncd/fpmlink.h"
#include "fpmsyncd/routesync.h"
#include "fpmsyncd/fpm/fpm.h"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netlink decode rewritten: getEvpnNextHop() rewritten; new rtattr parsers added; onMsg() decode path gains branches for L3 VNI / EVPN-MH attributes. Please verify non-EVPN route decoding is bit-for-bit identical to previous master.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep legacy route decoding separate from the EVPN/LWT encap additions. The existing RTA_GATEWAY path still determines IPv4/IPv6 from payload size; the new RTA_VIA branch is only used when the gateway attribute is absent. Encapsulation parsing returns false if VNI, interface, or RMAC is missing.

if (gate)
{
    if (RTA_PAYLOAD(tb[RTA_GATEWAY]) <= IPV4_MAX_BYTE)
    {
        memcpy(gateaddr, gate, IPV4_MAX_BYTE);
        gw_af = AF_INET;
    }
    else
    {
        memcpy(ipv6_address.s6_addr, gate, IPV6_MAX_BYTE);
        gw_af = AF_INET6;
    }
}
else if (tb[RTA_VIA])
{
    via = (struct rtvia *)RTA_DATA(tb[RTA_VIA]);
    /* parse VIA only when RTA_GATEWAY is not present */
}

if (encap_value == 0 || !(vlan.compare(ifname_unknown)) || MacAddress(rmac) == MacAddress("00:00:00:00:00:00"))
{
    return false;
}

Comment thread cfgmgr/intfmgr.cpp

#define LOOPBACK_DEFAULT_MTU_STR "65536"
#define DEFAULT_MTU_STR 9100
extern MacAddress gMacAddress;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface bring-up changes: setIntfIp() adds a SAG path, setIntfMpls() adds an early-return branch, doIntfGeneralTask() has 4 new hunks. Please check idempotence on re-apply.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep interface re-apply idempotent. SAG logic only runs when the static_anycast_gateway field is present; otherwise the legacy default mac_addr tuple is still pushed. SAG enable/disable reuses replace/delete FDB operations so repeated application lands in the same state.

if (!sag.empty())
{
    if (!alias.compare(0, strlen(VLAN_PREFIX), VLAN_PREFIX))
    {
        if (sag == "true")
        {
            m_sagIntfList[alias] = true;
            setSagFdbEntry("replace", alias, gwmac);
            data.push_back(FieldValueTuple("mac_addr", gwmac));
        }
        else if (sag == "false")
        {
            m_sagIntfList[alias] = false;
            setSagFdbEntry("del", alias, gSagMacAddress.to_string());
            data.push_back(FieldValueTuple("mac_addr", MacAddress().to_string()));
        }
    }
}
else
{
    data.push_back(FieldValueTuple("mac_addr", MacAddress().to_string()));
}

Comment thread cfgmgr/teammgr.cpp
#include <sys/wait.h>
#include <sys/types.h>
#include <signal.h>
#include <netlink/route/link.h>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LAG task + learn-mode: doLagTask() modifications and a new setLagLearnMode() block (~+105 lines). Please confirm legacy non-EVPN LAG bring-up sequence is unchanged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep legacy LAG bring-up sequencing intact. The existing create/admin/MTU/learn-mode/TPID flow still runs first, and the new system MAC update is conditional on system_mac being present. The kernel update uses a valid flags value for rtnl_link_change() and only writes APP_DB/STATE_DB after kernel success.

setLagAdminStatus(alias, admin_status);
setLagMtu(alias, mtu);
if (!learn_mode.empty())
{
    setLagLearnMode(alias, learn_mode);
}
if (!tpid.empty())
{
    setLagTpid(alias, tpid);
}
if (!sys_mac.empty())
{
    setLagSysmac(alias, sys_mac);
}

if (rtnl_link_change(sockk, orig_link, link, 0) < 0)
{
    return -1;
}

m_appLagTable.set(alias, fvs);
m_stateLagTable.set(alias, fvs);

Comment thread cfgmgr/vxlanmgr.cpp
#include <sstream>
#include <string>
#include <net/if.h>
#include <arpa/inet.h>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VXLAN create / tunnel-create flow modified: please re-verify idempotency on warm boot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided to keep VXLAN tunnel create and NVO replay idempotent. Tunnel create stores the same APP_DB payload through ProducerStateTable and refreshes the cache deterministically; NVO creation waits until the referenced tunnel is active and present in APP_DB; IPv6-only options are appended only when the source address is IPv6.

m_appVxlanTunnelTableProducer.set(vxlanTunnelName, kfvFieldsValues(t));
m_vxlanTunnelCache[vxlanTunnelName] = tuncache;

if (!m_appVxlanTunnelTable.get(value, fv))
{
    SWSS_LOG_WARN("NVO %s creation delayed. VTEP %s not found", EvpnNvoName.c_str(), value.c_str());
    return false;
}

if (inet_pton(AF_INET6, src_ip.c_str(), &addr6) == 1)
{
    link_add_cmd += " udp6zerocsumrx";
}

Comment thread orchagent/port.h
TUNNEL,
SUBPORT,
SYSTEM,
NEXTHOP_GROUP,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small struct addition — safe, but this is a widely-included header; please confirm no downstream #include cycle or size-sensitive consumer is affected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided with a minimal Port extension: one enum value and one object-id field, both in the existing Port structure and both defaulted through the existing zero-initialization style. This does not add a new include dependency or introduce a downstream include cycle.

enum Type
{
    PHY,
    MGMT,
    LOOPBACK,
    VLAN,
    LAG,
    TUNNEL,
    SUBPORT,
    SYSTEM,
    NEXTHOP_GROUP,
    UNKNOWN
};

sai_object_id_t m_nexthop_group_id = 0;

Comment thread orchagent/Makefile.am
mplsrouteorch.cpp \
neighorch.cpp \
intfsorch.cpp \
evpnmhorch.cpp \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New sources added — build-only; please sanity-check the build rules / link order.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is provided in the build rules by adding the new sources to the existing orchagent source list before the dependent users are linked. The Trixie swss package build completed with these objects in the standard build path.

            routeorch.cpp \
            mplsrouteorch.cpp \
            neighorch.cpp \
            intfsorch.cpp \
            evpnmhorch.cpp \
            l2nhgorch.cpp \
            port/port_capabilities.cpp \

            high_frequency_telemetry/hftelgroup.cpp \
            shlorch.cpp

Validation: Trixie swss build completed with PASS: 770/0 fail, PASS: 863/0 fail, and PASS: 70/0 fail.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tahmed-dev
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@mssonicbld
Copy link
Copy Markdown
Collaborator

Retrying failed(or canceled) stages in build 1124559:

✅Stage BuildTrixie:

  • Job amd64: retried.
  • Job arm64: retried.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread tests/p4rt/test_viplb.py Fixed
Comment thread tests/p4rt/test_viplb.py Fixed
Comment thread tests/test_inband_intf_mgmt_vrf.py Fixed
Comment thread tests/test_mux.py Fixed
Comment thread tests/p4rt/test_viplb.py Fixed
Comment thread tests/test_acl_mark.py Fixed
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

tahmed-dev added a commit to tahmed-dev/sonic-swss that referenced this pull request May 28, 2026
- teammgr: pass 0 (flags bitmask) to rtnl_link_change() instead of
  ifindex. The 4th arg is netlink flags, not the interface index;
  the target link is already identified by orig_link.
- fdborch.h, fdbsyncd/fdbsync.h: convert unscoped enum NEXT_HOP_VALUE_TYPE
  to scoped 'enum class FdbDest : uint8_t' to avoid global namespace
  pollution and enforce explicit qualification at call sites.
- fpmsyncd/routesync: guard ~RouteSync() against null m_nl_sock and
  free m_link_cache (previously leaked).

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@tahmed-dev tahmed-dev force-pushed the tahmed/evpn-mh-2-integrate-20260527 branch from de4b8a7 to bd137f7 Compare May 28, 2026 20:23
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tahmed-dev
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@mssonicbld
Copy link
Copy Markdown
Collaborator

Retrying failed(or canceled) stages in build 1125152:

✅Stage Test:

  • Job vstest: retried.

✅Stage BuildAsan:

  • Job amd64: retried.

Comment thread orchagent/routeorch.cpp
}

std::string RouteOrch::getLinkLocalEui64Addr(void)
std::string RouteOrch::getLinkLocalEui64Addr(const MacAddress &mac)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the signature here? As much as possible, could you limit the changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameterization is needed for SAG (Static Anycast Gateway): the SAG VLAN interface paths in intfsorch.cpp now call getLinkLocalEui64Addr(m_sagMac) so the link-local EUI64 is derived from the SAG MAC instead of the global switch MAC. The existing callers in routeorch.cpp and vrforch.cpp still pass gMacAddress, so the non-SAG path is unchanged — the only delta is adding the mac parameter.

Comment thread orchagent/neighorch.cpp Outdated

NeighborUpdate update = { neighborEntry, MacAddress(), false };
notify(SUBJECT_TYPE_NEIGH_CHANGE, static_cast<void *>(&update));
/* Notify observers after the neighbor is removed from the syncd table.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unintended churn — it changed the log message to use a local neighbor_mac instead of m_syncdNeighbors[neighborEntry].mac. I have reverted it, so this line now matches master.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

tahmed-dev added 13 commits May 29, 2026 12:47
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
- teammgr: pass 0 (flags bitmask) to rtnl_link_change() instead of
  ifindex. The 4th arg is netlink flags, not the interface index;
  the target link is already identified by orig_link.
- fdborch.h, fdbsyncd/fdbsync.h: convert unscoped enum NEXT_HOP_VALUE_TYPE
  to scoped 'enum class FdbDest : uint8_t' to avoid global namespace
  pollution and enforce explicit qualification at call sites.
- fpmsyncd/routesync: guard ~RouteSync() against null m_nl_sock and
  free m_link_cache (previously leaked).

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
After converting FdbDest to 'enum class : uint8_t', three SWSS_LOG_*
sites passing FdbDest to '%d' triggered -Werror=format=. Cast to int.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
- Add static_cast<int>(...) at remaining FdbDest %d log sites in
  fdborch.cpp and qualify FdbDest:: in mock_tests/fdborch/* and
  mock_tests/fdbsyncd/* (required after enum class conversion).
- Restore orchagent/p4orch/tests/mock_routeorch.h (lost during rebase)
  and update its RouteOrch ctor declaration to match the current
  prototype (adds NeighOrch* and ZmqServer*).

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
The teardown in test_InbandIntf called self.wait_for_table_empty('INTF_TABLE')
which did not exist on the class, leaving only the VRF-specific
wait_for_vrf_table_empty. All four parametrizations (Ethernet4, Vlan100,
PortChannel5, Loopback1) raised AttributeError in the finally block:

    AttributeError: 'TestInbandInterface' object has no attribute
    'wait_for_table_empty'

Add a generic wait_for_table_empty(table_name) that polls the application
DB until the named table is empty, mirroring the existing
wait_for_vrf_table_empty pattern.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Remove the pytest/vstest file changes from this PR while preserving branch history.

The C++ and mock test changes remain in the PR; this commit only restores Python test files under tests/ back to master to avoid vstest flakiness.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Branch tracing showed two common paths could still mutate state when EVPN-MH was not configured: fdbsyncd accepted L2 nexthop-group netlink events without EVPN NVO, and neighsyncd treated NUD_NOARP without NTF_EXT_LEARNED as a remote-move delete without checking EVPN NVO.

Gate both behaviors on EVPN NVO configuration so the feature-inactive path remains idempotent with master. fdbsyncd now ignores L2 NHG messages until NVO exists and clears L2 NHG APP_DB/cache state when NVO is removed. neighsyncd now preserves the existing NOARP ignore behavior unless NVO exists. The new SAG/VXLAN cached state is also default-initialized, and the EVPN split-horizon snprintf format is made type-safe.

Add fdbsyncd coverage for the inactive L2 NHG path and make EVPN L2 NHG tests explicitly opt in to NVO before expecting feature-specific APP_DB writes.

Validation: focused cppcheck on touched files found no new inactive-path issue. Standard Trixie swss package build passed with PASS 770, PASS 863, PASS 70 and FAIL/ERROR 0.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
The integration (pytest) tests under tests/ are restored to their
upstream baseline so this PR contains no Python test changes. EVPN-MH
test coverage is provided by the C++ mock_tests (gtest) instead.

A prior partial revert left ~9.6k lines of unrelated pytest deletions
and edits (gearbox.py, macsec.py, test_mux_prefixroute.py,
test_p4rt_l3_multicast.py, test_vnet.py, conftest.py, p4rt/*, etc.).
All tests/*.py and tests/**/*.py files now match the merge-base.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@tahmed-dev tahmed-dev force-pushed the tahmed/evpn-mh-2-integrate-20260527 branch from 5ef4e93 to d9ea5a1 Compare May 29, 2026 19:51
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

…ghorch churn

Gate all EVPN-MH neighsync behavior behind m_isEvpnNvoExist so that the
common (feature-disabled) code path is byte-for-byte unchanged from master.

Revert the net-new muxorch dualtor route changes that were absent in the
pre-rebase baseline; they belong to a separate test-fix effort and are
orthogonal to EVPN-MH.

Revert unintended cosmetic churn in neighorch.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@tahmed-dev tahmed-dev force-pushed the tahmed/evpn-mh-2-integrate-20260527 branch from d9ea5a1 to c2aaa78 Compare May 29, 2026 22:44
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tahmed-dev
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants