Skip to content

[code sync] Merge code from sonic-net/sonic-mgmt:202511 to 202603#1199

Merged
lizhijianrd merged 37 commits into
Azure:202603from
lizhijianrd:code-sync-202511-to-202603-20260519
May 19, 2026
Merged

[code sync] Merge code from sonic-net/sonic-mgmt:202511 to 202603#1199
lizhijianrd merged 37 commits into
Azure:202603from
lizhijianrd:code-sync-202511-to-202603-20260519

Conversation

@lizhijianrd
Copy link
Copy Markdown

@lizhijianrd lizhijianrd commented May 19, 2026

* 58d253ea7 - [202511] Add testbed-specific delay for acl and everflow (#24490) (#24635) (2026-05-18) [Tejaswini Chadaga]
* c358f071a - [202511] [conditional_mark] Skip decap/test_decap.py on Arista-720DT and remove stale parametrize-keyed entries (#24673) (2026-05-19) [Xichen96]
* 29805a876 - [action] [PR:24559] Increase config_reload_timeout for Nokia-7215 platform tests (#24690) (2026-05-18) [mssonicbld]
* a0b9bc7ce - [action] [PR:24552] [memory_utilization] retry sudo monit status when stale (#24691) (2026-05-18) [mssonicbld]
* 2f4277f7f - [action] [PR:23241] Fix IPv6-only topology support in generic_patch BGP convergence check (#24682) (2026-05-17) [mssonicbld]
* cedefe37b - [action] [PR:24385] Bump expected mgmt VRF table ID from 5000 to 6000 (#24604) (2026-05-17) [mssonicbld]
* e7fb48eb3 - [action] [PR:24273] [loganalyzer] Ignore ctrmgrd Docker/k8s version warnings in global LogAnalyzer ignore list (#24562) (2026-05-16) [mssonicbld]
* a14d472e6 - 202511: Fix fib/test_fib.py failures on v6-only (#24104) (2026-05-14) [wrideout-arista]
* a6392c1a7 - [202511] Adjust Q200 buffer pool watermark margin (#24608) (#24616) (2026-05-15) [ShiyanWangMS]
* 142a6a90f - [action] [PR:20946] Fix for parse_show_copp_configuration for multi-asic dut (#24632) (2026-05-15) [mssonicbld]
* 2919c4c7b - [202511][Backport #23939] Remove BUFFER_SIZE override in test_dhcp_counter_stress (#24592) (2026-05-15) [Xichen96]
* 0b817ed7e - [action] [PR:24600] [conditional_mark] xfail test_reload_configuration on Cisco-8102 due to timemaster.service (#24614) (2026-05-14) [mssonicbld]
* 6ce184b30 - [action] [PR:24594] [utilities] Raise tcpdump_buffer_size default in capture_and_check_packet_on_dut (#24621) (2026-05-14) [mssonicbld]
* fc42e65e8 - [action] [PR:23803] [202511][sflow] Parse Agent ID strings into ipaddress objects (#24624) (2026-05-14) [mssonicbld]
* acc07dd1c - [action] [PR:24479] Add loganalyzer ignore patterns for Mellanox SDK serdes/FEC errors (#24611) (2026-05-14) [mssonicbld]
* 2638cac51 - Add wait_until for trim counters to avoid timing related failures (#24345) (2026-05-14) [Ryan Garofano]
* ea7e49956 - [cherry-pick](#21863)(#22801) (#24549) (2026-05-14) [rajkumar1-arista]
* fa8091f7a - [action] [PR:21155] Support running acl tests without any IPv4 management configuration (#22006) (2026-05-14) [mssonicbld]
* 7888f6b7c - [202511][BGP] Fix rollback failure in test_bgp_dual_asn by cleaning up test p… (#24449) (2026-05-13) [Priyansh]
* ec2d7c740 - [action] [PR:24551] [platform_tests] Re-enable DAC skip for SFP in test_tx_disable_channel (#24579) (2026-05-13) [mssonicbld]
* 5bb8b1f00 - [action] [PR:24413] Add config template files for t2 min topology (#24576) (2026-05-13) [mssonicbld]
* 7438dbacc - [202511][cpu_shaper] Add gport API support for Broadcom TH5+ platforms (#24065) (#24450) (2026-05-12) [Priyansh]
* 26661727b - [action] [PR:23544] Fix `test_everflow_fwd_recircle_port_queue_check` for multi-asic (#24542) (2026-05-12) [mssonicbld]
* a344776ee - [action] [PR:21665] Skip MMU dynamic threshold test on t2 topology (#24540) (2026-05-12) [mssonicbld]
* e52c479c0 - [action] [PR:23504] Fix incorrectly passing namespace in FRR CLI shell calls (#24541) (2026-05-12) [mssonicbld]
* 15335738c - [ARP][test_arp_update] Fix subnet mismatch between PTF and DUT VLAN IP (#24419) (#24536) (2026-05-12) [Janet Cui]
* cb1057ae3 - [action] [PR:24480] Fix get_port_indexes_with_flat_memory crash when xcvr_api is None (#24528) (2026-05-12) [mssonicbld]
* 9203f21db - [action] [PR:24378] [Arista] Increment QSFP-DD port names by 4 instead of 8 (#24531) (2026-05-12) [mssonicbld]
* cc3b071a4 - [202511] Cherry-pick gnmi/gnxi testsuite & fixtures updates (#24392) (2026-05-11) [Dawei Huang]
* fa0b3eaba - [202511]Reapply PR23348 diff. broadcom-dnx check needs to support single asic (#24511) (2026-05-11) [saravanan sellappa]
* fe76f4702 - Fix test_crm_nexthop_group: split neighbor/route into two phases (#24454) (2026-05-11) [Sourabh Kumar]
* 51b4b8490 - [action] [PR:24506] drop_counters: ignore transient syncd flex counter ERR during config_reload (#24522) (2026-05-12) [mssonicbld]
* 6b3400c80 - [action] [PR:24384] Fix daemon kill_and_start_status tests to use wait_until instead of time.sleep(10) (#24523) (2026-05-12) [mssonicbld]
* 29482cb93 - [action] [PR:22509] Fix run_tests.sh "Prepare DUT failed, skip testing" failure on pytest 9.0.2  (#24515) (2026-05-12) [mssonicbld]
* cfe85483e - [action] [PR:23204] Fix PTF/DUT subnet mismatch when VLAN has multiple IPv4 addresses (#24519) (2026-05-12) [mssonicbld]
* a061c1159 - [202511] Increase default GCU timeout from 600s to 900s (#24501) (2026-05-11) [Zhaohui Sun]

ZhaohuiS and others added 30 commits May 12, 2026 00:28
#### Why I did it

Cherry-pick of #24427 into 202511 (auto cherry-pick had conflicts).

The default GCU (Generic Config Updater) apply-patch timeout of 600
seconds is insufficient for some platforms, causing test failures. This
increases the default to 900 seconds.

#### How I did it

Applied the same change from #24427 to the 202511 branch version of
`tests/common/gu_utils.py`:
- Added docstring to `get_gcu_timeout()`
- Changed default timeout from 600s to 900s

#### How to verify it

Run any GCU test on a platform not listed in `GCUTIMEOUT_MAP` and verify
the timeout is 900s.

#### Conflict resolution

On master, `get_gcu_timeout()` already had a multi-line docstring added
by a prior PR. On 202511, the function was a one-liner without a
docstring. Resolved by applying both the docstring addition and the
timeout value change to the 202511 version.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e IPv4 addresses (#24519)

### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: 

Fix PTF/DUT subnet mismatch in ARP tests when a VLAN interface has
multiple IPv4 addresses.


#### Root Cause

After the community added a second VLAN IP (`192.169.0.1`), two APIs
used by the ARP tests diverged in which IPv4 they selected:

| API | Behavior | Selected IP |
|-----|----------|-------------|
| `get_first_vlan_ipv4` | Returns the **first** IPv4 in the VLAN |
`192.168.0.1` (DUT) |
| `ip_and_intf_info` (conftest) | Uses the **last** IPv4 network and
assigns PTF an IP in that subnet | `192.169.x.x` (PTF) |

This put PTF and DUT in **different subnets**, causing ping failures and
MAC learning tests to break.

#### Fix

Rename `get_first_vlan_ipv4` to `get_vlan_last_ipv4` so it returns the
**last** IPv4 in the VLAN, matching the behavior of `ip_and_intf_info`.
Both APIs now consistently use `192.169.0.1`, keeping PTF and DUT in the
same subnet.

Additionally, improve robustness:
- Use `ip_network`/`IPv4Network` type checking for IPv4 detection
instead of the `":" in addr` heuristic
- Add `try`/`except` for `ValueError` on malformed addresses

#### Files Changed

- `tests/arp/arp_utils.py` — renamed function, updated import, new
selection logic
- `tests/arp/test_arp_update.py` — updated import and call site
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->


### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Shivashankar CR <shivashankar.c.r@gmail.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Shivashankar C R <58802632+cshivashgit@users.noreply.github.com>
…g" failure on pytest 9.0.2 (#24515)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Add explicit test path ${SCRIPT_PATH} to pytest commands that doesn't
have it (pretest, posttest, bsl).
With new pytest 9.0.2, without an explicit test path, it fails to
properly determine rootdir and load conftest.py, causing unrecognized
arguments error.
Known issue in pytest repo:
pytest-dev/pytest#13913
Fixes #22508

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [x] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: markxiao <markxiao@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Mark Xiao <markxiao@arista.com>
…t_until instead of time.sleep(10) (#24523)

### Description of PR

Summary:
Replace bare `time.sleep(10)` with `wait_until(120, 10, 0, ...)` in
`test_pmon_syseepromd_kill_and_start_status` and
`test_pmon_pcied_kill_and_start_status`. After a SIGKILL the supervisor
restarts the daemon asynchronously, so a fixed 10-second sleep is
insufficient on slower platforms and the daemon may still be in
`STARTING` state when the status check runs, causing:
```
Failed: syseepromd expected restarted status is RUNNING but is STARTING
```
Also adds the missing `check_expected_daemon_status` helper to
`test_pcied.py`, consistent with the pattern already used in
`test_syseepromd.py` and `test_psud.py`.

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
`time.sleep(10)` is a fixed delay that is not reliable. On some
platforms the daemon takes longer than 10s to restart after SIGKILL,
leaving it in `STARTING` state when the assert runs.

#### How did you do it?
Replace `time.sleep(10)` with `wait_until(120, 10, 0,
check_expected_daemon_status, duthost, expected_running_status)` which
polls up to 120s until the daemon reaches `RUNNING` state. This is
consistent with how `test_psud.py` handles the same scenario.

#### How did you verify/test it?
Verified on Arista-7060X6-64PE platform — both
`test_pmon_syseepromd_kill_and_start_status` and
`test_pmon_pcied_kill_and_start_status` pass.

#### Any platform specific information?
Reproduced on Arista-7060X6-64PE.

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A

Signed-off-by: Bing Wang <bingwang@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r ERR during config_reload (#24522)

### Description of PR
Summary:
`test_drop_counters.py::test_ip_pkt_with_expired_ttl` fails because
loganalyzer catches a transient `ERR` from syncd during test teardown.

**Root cause:**
The `configure_copp_drop_for_ttl_error` fixture (`drop_packets.py` line
271) calls `config_reload(duthost, safe_reload=True)` in its teardown
after modifying the COPP trap configuration. During reload, orchagent
restarts and immediately sends `FLEX_COUNTER_TABLE` SET commands via the
ASIC channel carrying the newly allocated port VIDs. At this point syncd
may not yet have finished creating all port SAI objects and populating
its VID→RID translation map. For each unresolved VID, `Syncd.cpp
processFlexCounterEvent` logs:
```
ERR syncd#syncd: :- processFlexCounterEvent: port VID <oid> was not found (probably port was removed/splitted) and will remove from counters now
```
syncd then self-heals by issuing a DEL for the stale counter entry. The
race resolves once orchagent finishes re-programming all ports.

**Fix:**
Add the pattern to the `ignore_expected_loganalyzer_exceptions` autouse
fixture so loganalyzer does not fail the test on this benign transient
noise.

### Type of change
- [x] Bug fix

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
`test_ip_pkt_with_expired_ttl` was failing on Arista-7060X6-64PE-P32O64
due to a loganalyzer false positive.

#### How did you do it?
Added `FlexCounterPortNotFoundRegex` to the
`ignore_expected_loganalyzer_exceptions` autouse fixture in
`test_drop_counters.py`. The regex matches exactly the transient syncd
message that fires during `config_reload` port re-initialization.

#### How did you verify/test it?
- Traced the error to `config_reload` in
`configure_copp_drop_for_ttl_error` teardown (drop_packets.py line 271).
- Confirmed in syncd source (`Syncd.cpp processFlexCounterEvent`) that
this is a known race: when `fromAsicChannel=true` and VID lookup fails,
the code logs ERR and cleans up by issuing DEL for the stale counter. No
functional impact.

#### Any platform specific information?
Observed on Arista-7060X6-64PE-P32O64 (broadcom ASIC). The error is not
platform-specific — it can occur on any platform after `config_reload`
when port flex counters are enabled.

#### Supported testbed topology if it's a new test case?
N/A — bug fix only.

### Documentation

Signed-off-by: Bing Wang <bingwang@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…454)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
Find all the details here:
sonic-net/sonic-mgmt#23129
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue) This fixes the race conditions that were observed on
Nvidia switches, this should also address
sonic-net/sonic-mgmt#20563

The configure_nexthop_groups() function had two problems:

Chunk batching bug: ip_batch[1:] was intended to skip only the first IP
(2.0.0.1, the base neighbor), but when batching with chunk_size=200, it
skipped the first element of EVERY batch, silently losing ~9 neighbors
and their routes.

Race condition: neighbor and route creation were interleaved in the same
for-loop, so a route could reference a nexthop before its neighbor was
fully programmed in HW.

Fix by separating into two phases and removing the chunk batching
mechanism (no longer needed with the two-phase approach):

Phase 1: add all neighbors in one shot, then poll CRM ipv4_neighbor
counter to confirm they are programmed in HW
Phase 2: add all routes in one shot after neighbors are confirmed

cherry-pick to 2025 had conflict. so created another PR

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
test_crm_nexthop_group[group_member=False] fails intermittently on
msn4600 and msn4700 platforms with:
CRM counter did not reach expected value within 60 seconds.
Expected: used >= 1891, Actual: used=1807

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?
Observed on Mellanox LSN4700 and SN4600C — platforms with large NHG
resource pools (~180K+) that cause the test to create ~1800+ nexthop
groups, widening the race window.

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: sourabh kumar <kumarsourabh@microsoft.com>
…gle asic (#24511)

Reapply PR23348 diff. broadcom-dnx check needs to support single asic
sonic-net/sonic-mgmt#23851 seems to have
reverted the fix from sonic-net/sonic-mgmt#23348

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Saravanan Sellappa <saravanan@nexthop.ai>
[202511] Cherry-pick gnmi/gnxi testsuite & fixtures updates from master

Backport of recent gnmi/gnxi test infrastructure work that has
accumulated on master since the 202511 branch cut. All cherry-picks were
applied in chronological merge order on top of `202511`.

### Included PRs (in cherry-pick order)

1. #22111 — Fix GNXI testsuite topology markers *(already present on
202511 via earlier backport — empty cherry-pick, skipped)*
2. #22412 — Remove autouse from gnmi test fixtures
3. #22354 — Update gnmi container parameters and add gnxi tests for
container upgrade
4. #22553 — Consolidate gnxi tests into gnmi directory
5. #22877 — Refactor gNMI fixtures to couple server config with clients
6. #23755 — Add PtfGnmic client wrapper and gnmic capabilities
integration test
7. #23876 — Add UDS transport support to gNMI/gNOI test fixtures
8. #24329 — [conditional_mark] Skip gnmi/test_gnmic.py

### Conflict resolutions

- **#22412**: `tests/gnmi/test_gnmi_2038.py` — added on master by #15770
which was never backported to 202511. Dropped that file's hunk; the rest
of the autouse removal applied cleanly.
- **#22553**: `tests/gnmi/conftest.py` — took master's
post-consolidation version (legacy proto-compile / `grpc_channel`
fixtures removed). Deleted `tests/gnmi/grpc_utils.py` and
`tests/gnmi/test_gnoi_system_grpc.py` consistent with master. The legacy
`test_gnoi_system_grpc.py` is superseded by the consolidated
`test_gnoi_system.py` which uses the new TLS-managed client framework.
- **#23755**: Top-level `Makefile` was added on master by #22506 (not
backported here). Dropped the Makefile hunk; the actual test code
(`tests/common/ptf_gnmic.py`, `tests/gnmi/test_gnmic.py`) and fixture
wiring were applied unchanged.
- **#24329**:
`tests/common/plugins/conditional_mark/tests_mark_conditions.yaml` —
additive merge, kept both new skip blocks.

### PRs deliberately excluded

- #22481 (cSONiC testbed) and #22506 (testbed Makefile) — not gnmi/gnxi
scope.
- #21529, #22248 — already on 202511 via batch backport #23653.

### Verification

- Python syntax check (`py_compile`) on all changed `.py` files: all OK.
- No 202511-only callers remain for the removed
`setup_and_cleanup_protos` / `compile_protos` / `grpc_channel` fixtures
(verified via `git grep`).
- Pre-commit / Azure pipelines not run locally — relying on the branch
CI for full verification.

---------

Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
…d of 8 (#24531)

SKU: x86_64-arista_7280r4_32qf_32df

This is the sonic-mgmt change that goes along with the sonic-buildimage
change: sonic-net/sonic-buildimage#27149

### Description of PR
For SKU `x86_64-arista_7280r4_32qf_32df` increment the QSFP-DD port
names based on their number of system lanes instead of number of line
lanes.

### Type of change

- [ ] Bug fix
- [x] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
MSFT requested we increment the port names based on the number of
system-side lanes instead of number of line-side lanes.

#### How did you do it?
Changed the `get_port_alias_to_name_map` function in port_utils.py for
this SKU to increment by 4 for all ports on
`x86_64-arista_7280r4_32qf_32df` SKU

#### How did you verify/test it?
Ran sonic-mgmt against this change and saw no related regressions.

#### Any platform specific information?
No

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A

Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: arista-nwolfe <94405414+arista-nwolfe@users.noreply.github.com>
…xcvr_api is None (#24528)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
The on-DUT script crashes with AttributeError when get_sfp() or
get_xcvr_api() returns None for empty SFP slots or unsupported
transceivers (e.g., passive DAC cables). This causes the
port_list_with_flat_memory fixture to fail at setup, blocking all
dependent tests.

Add None guards to skip ports without a valid xcvr API instead of
crashing. Ports that cannot be queried are simply not added to the
flat_memory list, allowing the actual tests to handle them gracefully.

Fixes test_sfp, test_check_sfp_eeprom, and test_xcvr_info_in_db
failures on Mellanox SN4700.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
The get_port_indexes_with_flat_memory() helper (added in PR:
sonic-net/sonic-mgmt#22561) assumes every SFP
slot has a valid
transceiver API. On empty/unpopulated ports, get_xcvr_api() returns
None, crashing the fixture and blocking 3 tests (test_sfp,
test_check_sfp_eeprom, test_xcvr_info_in_db).

#### How did you do it?
Added None checks for both get_sfp() and get_xcvr_api() return values
before calling is_flat_memory(), gracefully skipping
empty/unpopulated SFP slots instead of crashing.

#### How did you verify/test it?
Manual Testing

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: sourabh kumar <kumarsourabh@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Sourabh Kumar <kumarsourabh@microsoft.com>
…P (#24419) (#24536)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: Resolve 202511 conflicts in
sonic-net/sonic-mgmt#24419
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
The test_ptf_arp_learns_mac and test_dut_ping_learns_mac tests fail on
physical testbeds because:
1. dut_interface_info used get_vlan_last_ipv4() which could return a
VLAN IP not in the same subnet as the PTF interface IP
2. ptf_with_ip_config hardcoded /21 prefix which may not match the
actual VLAN subnet configuration

#### How did you do it?
- Adding get_vlan_ipv4_for_subnet() to find the VLAN interface whose
subnet contains the PTF IP
- Updating dut_interface_info to select the matching VLAN IP and expose
the correct prefix_len
- Updating ptf_with_ip_config to use the dynamic prefix_len instead of
hardcoded /21

#### How did you verify/test it?
```
--------------------------------------------- generated xml file: /data/sonic-mgmt-int/tests/logs/arp/test_arp_update.xml ---------------------------------------------
----------------------------------------------------------------------- live log sessionfinish ------------------------------------------------------------------------
INFO     root:__init__.py:67 Can not get Allure report URL. Please check logs
============================================================= 5 passed, 215 warnings in 377.63s (0:06:17) =============================================================
DEBUG:tests.conftest:[log_custom_msg] item: <Function test_dut_ping_learns_mac[str2-msn4600c-acs-03-None]>
```
#### Any platform specific information?
msn4600c
#### Supported testbed topology if it's a new test case?
t0-64
### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Janet Cui <janet970527@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…l calls (#24541)

### Description of PR

[PR22094](sonic-net/sonic-mgmt#22094) recently
introduced -n namespace arg in various commands FRR CLI shell calls. It
passes test passes this namepsace arg as `-n asic0` in various `vtysh`
commands.

The FRR CLI shell calls command takes namespace index eg `vtysh -n 0
...` . Fix is to asic_index instead of dut_namespace string at these
calls.

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

Fix failing `bgp/test_ipv6_nlri_over_ipv4` on multi-asic systems as
various usage of `vtysh` command from the test should take namespace
index eg `vtysh -n 0 ...` instead of namespace string.

#### How did you do it?

Fix is to asic_index instead of dut_namespace string at these calls.
Create seperate setup variable and pass when executing FRR CLI shell
calls.

#### How did you verify/test it?

Test `bgp/test_ipv6_nlri_over_ipv4` passes on both single and multi-asic
system with these changes.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: setu <setu@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Setu Patel <171176331+arista-setu@users.noreply.github.com>
…24540)

Signed-off-by: Anand Mehra (anamehra) <anamehra@cisco.com>
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # sonic-net/sonic-buildimage#24802

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [x] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [x] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
The test case is not a valid scenario for T2. The dynamic threshold for
pg lossless is not modified via GCU.

#### How did you do it?
Skip the test for T2

#### How did you verify/test it?
Run test on T2 system

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Anand Mehra (anamehra) <anamehra@cisco.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: anamehra <54692434+anamehra@users.noreply.github.com>
… for multi-asic (#24542)

### Description of PR

In sonic-net/sonic-mgmt#21942 we fixed
`test_everflow_fwd_recircle_port_queue_check` to return the total number
of packets sent by `send_and_check_mirror_packets` and use that value as
the expected number of packets seen on the `Ethernet-Rec` interface.

The problem is on multi-asic skus `send_and_check_mirror_packets` might
use 2 src_ports, one on ASIC0 and the other on ASIC1. This results in
half the packets utilizing `Ethernet-Rec0` and the other half using
`Ethernet-Rec1`.

This change will update the return value of
`send_and_check_mirror_packets` to be a dictionary instead of an
integer.
The dictionary will be key'd on `(dut, asic)` and the value will be the
number of packets sent.
With this return value `test_everflow_fwd_recircle_port_queue_check` can
just look at how many packets were sent on the `(dut, asic)` belonging
to the `Ethernet-Rec` port we're checking.

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
Fix `test_everflow_fwd_recircle_port_queue_check` for multi-asic SKUs

#### How did you do it?
Track packets per `(dut, asic)` to only assert on the packets sent on
our `Ethernet-Rec`'s `(dut, asic)`

#### How did you verify/test it?
Ran `test_everflow_fwd_recircle_port_queue_check` on the following SKUs:
single-asic fixedsystem
multi-asic fixedsystem
single-asic chassis
multi-asic chassis

#### Any platform specific information?
N/A

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A

Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: arista-nwolfe <94405414+arista-nwolfe@users.noreply.github.com>
…s (#24065) (#24450)

Cherry-pick of : sonic-net/sonic-mgmt#24065

### Description of PR

Summary:
The `cpu_shaper` test uses a BCM cint script (`get_shaper.c`) that calls
`bcm_cosq_port_bandwidth_get` to read CPU queue shaper PPS values. This
is not supported on TH5+ devices, causing the test to fail on platforms
like Arista 7060X6.

This change updates the cint script to try the modern gport-based API
(`bcm_cosq_gport_bandwidth_get`) first, which works on TH5+ and previous
platforms, and falls back to the legacy port-based API for platforms
where the gport API may not be available.

The success output format is preserved (`cos=N pps_max=M`) so no changes
are needed in `test_cpu_shaper.py`.

### Type of change

- [x] Bug fix

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

The test `cpu_shaper/test_cpu_shaper.py` fails on TH5+ devices because
`bcm_cosq_port_bandwidth_get`. The BCM SDK returns rv=-16
(`BCM_E_UNAVAIL`), the cint script prints an error line, the Python
regex finds no matches, and the assertion fails with `actual_pps = {}`.

#### How did you do it?

Updated `tests/cpu_shaper/scripts/get_shaper.c` to:
1. Try `bcm_cosq_gport_bandwidth_get` (modern gport-based API) first,
works on all platforms
2. If it fails, fall back to `bcm_cosq_port_bandwidth_get` (legacy API)
for older platforms.
3. If both fail, print a single error line with both return codes for
debugging.
4. Reset output parameters (`flags`) between API calls to prevent stale
values from affecting the fallback call.

No changes to `test_cpu_shaper.py`, the success output format (`cos=%d
pps_max=%d`) is identical for both API paths and matches the existing
regex `r'cos=(\d+) pps_max=(\d+)'`.

#### How did you verify/test it?

- **DNX platform (7060X6 / J2C+):** gport API succeeds on first attempt,
test passes.
- **XGS platform:** gport API succeeds (modern XGS), or falls back to
legacy API
  (older XGS). Existing behavior preserved.
- Verified that error output format does not false-match the Python
regex.

#### Any platform specific information?

- TH5+ devices, e.g., Arista 7060X6: require the gport-based API. The
legacy `bcm_cosq_port_bandwidth_get` returns `BCM_E_UNAVAIL`.

#### Supported testbed topology if it's a new test case?

N/A — existing test, topology unchanged (t0, t1).

### Documentation

N/A — no new features or test cases.

---------

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
…4576)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Add configuration profile symlinks for core/leaf switches related to t2
min topology.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [x] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
Config templates that will be used to test t2-single-node-min topology.

#### How did you do it?

#### How did you verify/test it?
Validated by deploying t2-single-node-min topology works with these
files added.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Vinay Kaza <vinay@nexthop.ai>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: vinay-nexthop <vinay@nexthop.ai>
…st_tx_disable_channel (#24579)

### Description of PR

`platform_tests.api.test_sfp.TestSfpApi::test_tx_disable_channel` is
supposed to skip ports whose transceiver doesn't support per-channel TX
disable (e.g. DAC cables). On platforms with SFP+ DAC uplinks (e.g.
Nokia 7215) the skip stopped taking effect and the test fails on every
SFP+ DAC port.

**Root cause:** PR #23972 added `"SFP"` to the plain-string compliance
check list inside `is_xcvr_optical()`:

```diff
- if xcvr_info_dict["type_abbrv_name"] in ["QSFP-DD", "OSFP-8X", "QSFP+C", "BP"]:
+ if xcvr_info_dict["type_abbrv_name"] in ["QSFP-DD", "OSFP-8X", "QSFP+C", "BP", "SFP"]:
```

That change was needed for Cisco-console SFP whose
`specification_compliance` is a plain string, but it broke the existing
SFP DAC detection path: a standard SFP/SFP+ DAC reports
`specification_compliance` as a **dict-formatted string** (e.g.
`{'SFP+CableTechnology': 'Passive Cable', ...}`). Because `"SFP"` is now
matched in the first branch, the dict-string never matches the two
literal copper strings, the function returns `True`, and
`test_tx_disable_channel` runs on the DAC port and fails.

**Fix:** keep the new plain-string fast-path (still handles
QSFP-DD/OSFP-8X/QSFP+C/BP and Cisco-console SFP), but for `SFP` fall
through to the existing `ast.literal_eval()`-based dict parsing when the
spec is not one of the known plain-string copper values.
`ast.literal_eval` is wrapped in `try/except (ValueError, SyntaxError)`
so a non-dict, non-copper spec is treated as optical instead of raising.

Summary:
Fixes the regression introduced by #23972 for SFP DAC transceivers.

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
`test_tx_disable_channel` started failing on platforms with SFP+ DAC
uplinks (e.g. Nokia 7215) after PR #23972. The DAC skip in
`is_xcvr_optical()` no longer triggers for `type_abbrv_name == "SFP"`
whose `specification_compliance` is a dict-formatted string.

#### How did you do it?
In `tests/platform_tests/api/test_sfp.py::is_xcvr_optical`:
- Keep the plain-string check for `QSFP-DD`, `OSFP-8X`, `QSFP+C`, `BP`,
and `SFP` (returns False on `"Passive Copper Cable"` /
`"passive_copper_media_interface"`).
- For `SFP`, if the spec didn't match those plain strings, fall through
to `ast.literal_eval(spec)` and the existing `SFP+CableTechnology ==
"Passive Cable"` check.
- For all other types, keep the existing dict-based `10/40G Ethernet
Compliance Code` / `Extended Specification Compliance` "CR" check.
- Wrap `ast.literal_eval` in `try/except (ValueError, SyntaxError)` so a
non-dict, non-copper plain-string spec is treated as optical (preserves
the Cisco-console SFP behavior added by #23972).

#### How did you verify/test it?
Ran the testcase on a Nokia 7215 testbed (m0 topology, 4× SFP+ DAC
uplinks) with the fix:

```
platform_tests/api/test_sfp.py::TestSfpApi::test_tx_disable_channel
WARNING tests.platform_tests.api.test_sfp:test_sfp.py:862 test_tx_disable_channel: Skipping transceiver 49 (not applicable for this transceiver type)
WARNING tests.platform_tests.api.test_sfp:test_sfp.py:862 test_tx_disable_channel: Skipping transceiver 50 (not applicable for this transceiver type)
WARNING tests.platform_tests.api.test_sfp:test_sfp.py:862 test_tx_disable_channel: Skipping transceiver 51 (not applicable for this transceiver type)
WARNING tests.platform_tests.api.test_sfp:test_sfp.py:862 test_tx_disable_channel: Skipping transceiver 52 (not applicable for this transceiver type)
PASSED
================= 1 passed, 154 warnings in 287.91s (0:04:47) ==================
```

All four SFP+ uplinks (49–52) are now correctly identified as DAC and
skipped.

#### Any platform specific information?
First reported on Nokia 7215 (4× SFP+ DAC uplinks), but the regression
affects any platform where an SFP/SFP+ DAC's `specification_compliance`
is reported as a dict-formatted string (which is the standard / default
representation for SFP DAC).

#### Supported testbed topology if it's a new test case?
N/A — fix to an existing test case.

### Documentation
N/A

Signed-off-by: Zhijian Li <zhijianli@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p test p… (#24449)

…eer ranges before teardown (#24040)

What: Added explicit deletion of test-specific BGP_PEER_RANGE entries
(BGPSLBPassive, BGPSLBPassive2, BGPSLBPassiveV6, BGPSLBPassiveV62) in
dual_asn_teardown() before rollback, with multi-ASIC support.
Why: test_bgp_dual_asn_v4 splits the VLAN subnet (192.168.0.0/21) into
two /22 halves. At teardown, rollback_or_reload() tries to restore
BGPVac (/21) but FRR rejects it with "Listen range overlaps" because the
test's /22 ranges are still active — a race condition with async bgpcfgd
processing.
How: Delete all test peer ranges with module_ignore_errors=True before
the setup_env fixture runs rollback, preventing the overlap conflict.
Testing: Ran test_bgp_dual_asn_v4 on Broadcom 7060x6 (202511). No
bgpcfgd overlap errors. BGPVac correctly restored. All CI checks passed.

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
…ment configuration (#22006)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes #18077 

Modify the gratuitous arp service such that arp or neighbor-discovery
packets are sent based on the L3 configuration of the testbed topology.
For testbed topologies which do not configure IPv4 addressing, such as
isolated-v6 testbeds, we skip the creation of arp packets and do not
send them. In this case, only IPv6 neighbor-discovery packets are
created and sent. The inverse is true as well when a testbed topology
only specifies the use of IPv4 connections.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [x] 202505

### Approach
#### What is the motivation for this PR?
This change was made as a part of the isolated-v6 testbed qualification
effort. Running acl tests on isolated-v6 testbeds was previously
unsupported, and skipped via conditional mark, as the tests expected to
receive IPv4 arp packets when no IPv4 connectivity was established.

#### How did you verify/test it?
The full suite of acl tests was run against a testbed where IPv4 and
IPv6 connectivity was specified in the topology config, and no
regressions were seen. The same suite of tests was run against a
isolated-v6 testbed, and new test passes were observed: some new
failures were also observed for tests which were previously skipped.
However @r12f asked that we submit this change, and follow up with the
remaining failing tests individually in subsequent changes as this
change improves overall test coverage.

#### Any platform specific information?
None.

Signed-off-by: Will Rideout <wrideout@arista.com>
Co-authored-by: wrideout-arista <wrideout@arista.com>
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
This is a cherry-pick of
sonic-net/sonic-mgmt#22801 and
sonic-net/sonic-mgmt#21863. #22801 is needed fix
for aristanetworks/sonic-qual.msft#1266, but
its relies on another missing PR (#21863). See respective PRs for
details
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes #  aristanetworks/sonic-qual.msft#1266

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

---------

Signed-off-by: rajkumar1 <rajkumar1@arista.com>
Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Co-authored-by: gshemesh2 <gshemesh@nvidia.com>
Co-authored-by: Rustiqly (agent of lihuay) <245760149+rustiqly@users.noreply.github.com>
Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
…4345)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
We were seeing failures in the
`base_packet_trimming.py::test_trimming_counters` testcase related to
trim counters being zero.

#### How did you do it?
We could not reproduce the issue while using a debugger, leading us to
believe it was timing related. After adding a sleep before reading the
counters, we did not see the issue. We opted to implement logic that
uses `wait_for` instead of sleeping.

#### How did you verify/test it?
Verified the failures were no longer happening with this change.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Ryan Garofano <rgarofano@arista.com>
…serdes/FEC errors (#24611)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
The Mellanox SDK recently changed its log format to include
'client_pid=N, '
before the file path, causing existing ignore patterns to no longer
match.
Additionally, FEC alignment lock polling generates errors on pre-SPC4
platforms (SN2700, SN4600C, SN4700) and orchagent emits errors for ports
without serdes objects. These unignored ERR-level syslogs cause
loganalyzer
teardown failures across all Mellanox platforms affecting any test using
loganalyzer.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### How did you do it?
New patterns added:
- SAI_UTILS FEC_ALIGNMENT_LOCK get_dispatch_attribs_handler (new SDK
format)
- SAI_UTILS sai_get_attributes failures (new SDK format with client_pid)
 - SAI_PORT mlnx_port_state_get FEC alignment lock on pre-SPC4 platforms
- orchagent clearPortPhySerdesAttrCounterMap for ports without serdes
objects
 
#### How did you verify/test it?
Manual testing

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: sourabh kumar <kumarsourabh@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Sourabh Kumar <kumarsourabh@microsoft.com>
…ress objects (#24624)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: In the sFlow ptftests, parse the Agent IDs into ipaddress
objects for more accurate comparison between expected and actual values.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

In the sFlow ptftests, the Agent IDs, which are IP addresses, are
compared as strings. This causes problems in `--mgmtIpv6Only` setups,
since this comparison doesn't recognize normalized and fully-expanded
IPv6 addresses as the same.

#### How did you do it?

This change transforms the strings into `IPv6Address`/`IPv4Address`
object from the `ipaddress` library, so that the comparison will be
accurate.

#### How did you verify/test it?

I confirmed in a Pikez M0 cluster with `--mgmtIpv6Only` that this change
fixes the test.

#### Any platform specific information?
N/A

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Vitor Mendonca <vitor@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: vitor-arista <vitor@arista.com>
…capture_and_check_packet_on_dut (#24621)

**Why**

PR #22876 cut the default `tcpdump_buffer_size` in
`capture_and_check_packet_on_dut` from 102400 KiB (100 MiB) to 4096 KiB
(4 MiB) to address the 1 GiB explicit override in
`test_dhcp_counter_stress` causing a ~2 GiB memory spike. The 4 MiB
default, however, is too small for normal stress-test workloads with
bursty packet rates on lower-perf platforms (720dt, 7215).

**Note: PR #22876 was not actually validated.**

The `How did you verify/test it?` section of #22876 only states:

> - Code passes flake8 with max-line-length=120
> - Fix matches the exact unit interpretation from tcpdump documentation

No test execution. The 4 MiB value was a unit-correction choice based on
tcpdump docs, not an empirically-tested minimum. As shown by the data
below, 4 MiB drops 7.2% of relayed DHCP packets on real hardware — i.e.
the new default was never validated against the stress-test workload
that motivated the change.

Empirical measurements on `testbed-bjw2-can-720dt-6`
(Arista-720DT-G48S4, SONiC.20251110.26) running
`test_dhcp_counter_stress[discover]` (25 pps × 48 servers × 120 s):

| Buffer | tcpdump drop rate vs dhcpmon counter | test result |
|---:|---:|---|
| 1 MiB | 13.0% | FAIL |
| **4 MiB (current default, set by #22876)** | **7.2%** | **FAIL** |
| 16 MiB | 1.47% | FAIL |
| 64 MiB | <0.01% | PASS |

**How**

Bump default from `4096` (4 MiB) to `131072` (128 MiB). Gives
comfortable headroom for stress tests on slow platforms while remaining
8× smaller than the historical 1 GiB explicit override that caused
memory spikes.

**Back port request**
- [x] 202511

Refs: PR #22876 (commit 58a6f0b), PR #20580, PR #24592 (companion
202511-only cleanup).

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n on Cisco-8102 due to timemaster.service (#24614)

### Description of PR

Summary:
On Cisco-8102 lab DUTs running recent SONiC images, `timemaster.service`
fails benignly (no reachable NTP/PTP source in the lab management
network). This causes `systemctl is-system-running` to report
`degraded`, which trips `config_system_checks_passed` in
`test_reload_configuration`. The pre-reload system-state gate at
`platform_tests/test_reload_config.py:63` then polls for 360s and
asserts:

```
$ systemctl is-system-running
degraded (rc=1)

$ systemctl list-units --state=failed
* timemaster.service loaded failed failed Synchronize system clock to NTP and PTP time sources
1 loaded units listed.
```

The failure is unrelated to the data plane and unrelated to `config
reload` itself - the assertion fires before the test ever invokes
`config reload`. Reproduced on multiple Cisco-8102 testbeds in the same
nightly plan.

This PR adds an `xfail` clause scoped strictly to `platform ==
'x86_64-8102_64h_o-r0'` while the underlying `timemaster.service` issue
is tracked separately on the platform/image side. The test still runs
(so XPASS will surface once the platform issue is fixed); only the
assertion is allowed to fail without breaking the plan.

Fixes # (issue)

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
Unblock the Cisco-8102 nightly pipeline while the platform-side
`timemaster.service` issue is tracked separately. The current assertion
provides no actionable signal - it only reports that a non-data-plane
systemd unit is in a failed state, which is already known.

#### How did you do it?
YAML-only change in `conditional_mark`. Added an `xfail` block next to
the existing `skip` block on
`platform_tests/test_reload_config.py::test_reload_configuration`, with
a single condition matching the 8102 platform string.

```yaml
 xfail:
 reason: "timemaster.service fails benignly on Cisco-8102 lab DUTs ..."
 conditions:
 - "platform in ['x86_64-8102_64h_o-r0']"
```

#### How did you verify/test it?
- Inspected pytest log from a failing nightly run: assertion at
`test_reload_config.py:63` with `timemaster.service` as the sole failed
unit on str2-8102-01.
- Confirmed `conditional_mark` supports concurrent `skip` + `xfail`
blocks on the same test - precedent at
`platform_tests/test_reboot.py::test_watchdog_reboot` in the same YAML
file.
- Confirmed the platform string `x86_64-8102_64h_o-r0` is already used
in this YAML (see `test_watchdog_reboot`).
- Other Cisco-8000 SKUs (8101, 8111, 8800-LC) and all non-Cisco
platforms are unaffected because the condition matches the exact 8102
platform string only.

#### Any platform specific information?
Cisco-8102 only (`x86_64-8102_64h_o-r0`).

#### Supported testbed topology if it's a new test case?
N/A - existing test, no topology change.

### Documentation
N/A - no user-facing behavior or feature changes.

Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: ShiyanWangMS <shiyanwang@microsoft.com>
Co-authored-by: wsycqyz <wsycqyz@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…unter_stress (#24592)

Partial back-port of #23939 — only the buffer-size removal part.

The other changes in #23939 (adding
`relay_agent`/`downlink_vlan_iface_name` ptf_runner params, dropping
`.json` from `count_file` path) depend on PR #19198 which was reverted
from 202511 by #23714, so they don't apply here.

**Why**

After PR #22876 was back-ported to 202511, the framework default for
`tcpdump_buffer_size` (in `capture_and_check_packet_on_dut`) dropped
from 100 MiB to 4 MiB. The explicit `BUFFER_SIZE = 1024` (1 MiB)
override in this test is now smaller than the framework default.
Removing the override lets the test use the (larger) framework default
and matches master.

**Note**

The 4 MiB default is still too small to fully pass on 720dt-class
hardware under this test's stress load. A follow-up PR will raise the
default in `capture_and_check_packet_on_dut`. This PR is pure cleanup to
bring 202511 in line with master.

**Tested**

Applied locally on `internal-202511` @ `da7363c`,
`testbed-bjw2-can-720dt-6` (Arista-720DT-G48S4, SONiC.20251110.26):
- Before (1 MiB): tcpdump drops ~13% of relayed packets, FAIL
- After (4 MiB framework default): drops reduced to ~7%, still over the
0.01% margin (follow-up PR will fix)
- For reference, 64 MiB passes cleanly.

Refs: #23939, #22876, #20580.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sic dut (#24632)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

- This PR fixes the issue #20896
- It adds multi-asic support for the new test
'_test_verify_copp_configuration_' and also fixes the issue with
'_test_policer_' as part of the changes introduced by #18326

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [x] 202505

### Approach
#### What is the motivation for this PR?

- To fix the issue #20896 and also to fix the issue with
'_test_policer_' as part of the changes introduced by #18326

#### How did you do it?

- Verify if the DUT is multi_asic and modified commands based on the
result.

#### How did you verify/test it?

- Ran all the COPP test cases on T2 multi-asic DUT and made sure all the
tests are passed.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
<img width="387" height="718" alt="image"
src="https://github.com/user-attachments/assets/a291b892-2e61-48c2-98c0-2a53329ffe41"
/>

Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: sanjair-git <114024719+sanjair-git@users.noreply.github.com>
Cherry-pick of #24608 to 202511.

Original PR: sonic-net/sonic-mgmt#24608

### Description of PR
Summary:
Fix failure on qos/test_qos_sai.py:testQosSaiBufferPoolWatermark on Q200
(Cisco-8102-C64).
The SMS usage is changing without any traffic; initial watermark
fluctuation caused failures. Adjusts margin to 6 pkts for `gb` ASIC.

Note: A minor merge conflict on
`tests/qos/files/cisco/qos_param_generator.py` was resolved by
preserving 202511's existing `extra_cap_margin = 20` for lossless (the
change in master from 20 -> 25 is unrelated to this PR) and adding the
new `gb` blocks from #24608.

### Type of change
- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
Cherry-pick of #24608 into 202511.

#### How did you do it?
Cherry-picked commit 667279bef0682e1152ff08052df1d1d2eaaf535a from
#24608. Resolved conflict in
`tests/qos/files/cisco/qos_param_generator.py`.

#### How did you verify/test it?
Verified on m64 Cisco-8102-C64 in original PR #24608.

#### Any platform specific information?
Q200.

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>
Co-authored-by: Zhixin Zhu <zhixzhu@cisco.com>
### Description of PR
When running on isolated-v6 testbeds, use IPv6 addressing in the
outermost L3 header of IPinIP packets, as IPv4 addresses are not
configured and are unresolvable on these testbeds.


Signed-off-by: Will Rideout <wrideout@arista.com>
…arnings in global LogAnalyzer ignore list (#24562)

Add two global LogAnalyzer ignore patterns for ctrmgrd ERR lines that
are emitted
when kubeadm reports Docker version incompatibility (Docker 28.x vs old
k8s cluster).
These warnings bleed into downstream test LogAnalyzer windows, causing
false failures.

### Description of PR

Summary:

When `test_kubesonic_join_and_disjoin` runs (or fails and retries),
`ctrmgrd` calls
`kubeadm join`, which emits Docker/kubelet version warnings to syslog as
ERR lines:

```
ERR ctrmgrd.py: Refer file /tmp/tmpXXXXkube_hints_ for troubleshooting tips
ERR ctrmgrd.py: [WARNING SystemVerification]: Docker version is not on the list of
 validated versions: 28.2.2. Latest validated version: 20.10
```

These ERR lines appear ~seconds after the kubesonic test completes and
fall inside the
**next test's** LogAnalyzer window, causing false failures in unrelated
tests such as
`snmp/test_snmp_queue.py::test_snmp_queues`.

Root cause: Docker 28.x is not on kubeadm's validated versions list for
the currently
deployed k8s version. This is a known infra limitation (k8s upgrade is
in progress).

Fix: add the two patterns to `loganalyzer_common_ignore.txt` (global,
not per-test)
because any test following a kubesonic join test may be affected.

This is the defense-in-depth complement to #24159 (which fixes kubesonic
teardown to
call `config kube server disable on`, stopping ctrmgrd from retrying k8s
join).

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach

#### What is the motivation for this PR?
Docker 28.x (installed on test hosts) is not on kubeadm's validated
versions list for
the k8s version currently deployed in the test environment. This causes
ctrmgrd to emit
ERR-level log lines when kubesonic join/disjoin runs. These ERR lines
fall inside
downstream LogAnalyzer windows and cause false failures. The k8s team is
working on an
upgrade; this ignore list entry prevents false failures in the interim.

#### How did you do it?
Added 2 regex patterns to
`ansible/roles/test/files/tools/loganalyzer/loganalyzer_common_ignore.txt`:
```
r, ".* ERR ctrmgrd\.py: Refer file .azure-pipelines .flake8 .git .github .gitignore .hooks .markdownlint.json .pre-commit-config.yaml .pre-commit-hooks.yaml for troubleshooting tips.*"
r, ".* ERR ctrmgrd\.py:.*\[WARNING SystemVerification\]:.*Docker version is not on the list of validated versions.*"
```
Patterns are scoped to ctrmgrd Docker/kubelet version warnings and do
not suppress
unrelated ctrmgrd errors.

#### How did you verify/test it?
Confirmed via log analysis of nightly job

[69e79fb88e43924279229609](https://elastictest.org/scheduler/testplan/69e79fb88e43924279229609)
that these two exact lines triggered the LogAnalyzer failure in
test_snmp_queues teardown.

#### Any platform specific information?
None - global ignore entry affects all platforms.

#### Supported testbed topology if it's a new test case?
N/A

### Documentation

N/A

ADO: https://msazure.visualstudio.com/One/_workitems/edit/37717660

Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Liping Xu <108326363+lipxu@users.noreply.github.com>
mssonicbld and others added 7 commits May 17, 2026 03:40
… (#24604)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes #24386 

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

The mgmt VRF table ID got bumped from 5000 to 6000
(sonic-net/sonic-buildimage#26410). But
`verify_show_command`, a function called as part of the module's test
setup, was failing because it still expected the mgmt VRF table ID to be
5000. This caused the entirety of `tests/mvrf/test_mgmtvrf.py` tests to
fail.

#### How did you do it?

Updated `verify_show_command` to expect 6000 as the mgmt VRF table ID

#### How did you verify/test it?

Run `tests/mvrf/test_mgmtvrf.py` on a DUT; the tests should run
normally.
Without the changes, the following error shows up. 
```sh
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 def verify_show_command(duthost, mvrf=True):
 show_mgmt_vrf = duthost.shell("show mgmt-vrf")["stdout"]
 mvrf_interfaces = {}
 if mvrf:
 mvrf_interfaces["mgmt"] = r"\d+:\s+mgmt:\s+<NOARP,MASTER,UP,LOWER_UP> mtu\s+\d+\s+qdisc\s+noqueue\s+state\s+UP"
 mvrf_interfaces["vrf_table"] = "vrf table 5000"
 mvrf_interfaces["eth0"] = r"\d+:\s+eth0+:\s+<BROADCAST,MULTICAST,UP,LOWER_UP>.*master mgmt\s+state\s+UP "
 mvrf_interfaces["lo"] = r"\d+:\s+lo-m:\s+<BROADCAST,NOARP,UP,LOWER_UP>.*master mgmt"
 if "ManagementVRF : Enabled" not in show_mgmt_vrf:
 raise Exception("'ManagementVRF : Enabled' not in output of 'show mgmt vrf'")
 for _, pattern in list(mvrf_interfaces.items()):
 if not re.search(pattern, show_mgmt_vrf):
> raise Exception("Unexpected output for MgmtVRF=enabled")
E Exception: Unexpected output for MgmtVRF=enabled
_ = 'vrf_table'
pattern = 'vrf table 5000'
mvrf/test_mgmtvrf.py:245: Exception
```

#### Any platform specific information?

n/a

#### Supported testbed topology if it's a new test case?

n/a

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: donggyu-nexthop <donggyu@nexthop.ai>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: donggyu-nexthop <donggyu@nexthop.ai>
…GP convergence check (#24682)

Summary: Fix IPv6-only topology support in generic_patch BGP convergence
check
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
PR #22895 added BGP session convergence wait before DB comparison in
`generic_patch_add_t0()`, but unconditionally checks both IPv4 and IPv6
BGP sessions. On IPv6-only topologies (e.g. `t1-isolated-v6-d56u1-lag`),
`tor_data["ip"]["remote"]` is empty, causing
is_bgp_session_established() to fail.


#### How did you do it?
Fix by checking whether each neighbor IP exists before waiting for the
BGP session, consistent with the `chk_any_bgp_session()` approach from
PR #21591.


#### How did you verify/test it?
Regression test pass

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: weguo-NV <154216071+weiguo-nvidia@users.noreply.github.com>
… stale (#24691)

### Description of PR


Summary:

When `sudo monit validate` is run just before `sudo monit status`, the
status output may still carry the **old** &Azure#34;data collected&Azure#34;
timestamp because monit hasn&Azure#39;t finished its internal refresh cycle
yet. This causes the memory-utilization plugin to read stale baseline
data before or after a test run.

This PR adds a **freshness-retry** mechanism:

1. `record_monit_baseline_from_validate_output(validate_output)` —
parses the System-block &Azure#34;data collected&Azure#34; timestamp from the
`sudo monit validate` stdout and saves it as a baseline. Called in both
`pytest_runtest_setup` and `pytest_runtest_teardown` (in `__init__.py`)
right after `sudo monit validate`.

2. `read_monit_status_with_freshness_retry(cmd)` — executes `sudo monit
status`, compares the System-block &Azure#34;data collected&Azure#34; timestamp
against the saved baseline, and if they still match (stale), sleeps
`MONIT_STATUS_FRESHNESS_WAIT_SECONDS` (60 s) and retries, up to
`MONIT_STATUS_FRESHNESS_MAX_RETRIES` (3) times. Used only for the
`monit` command entry.

Both constants are module-level tunables so they can be overridden in
tests.

### Type of change



- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

Intermittent false-positive memory alarms were caused by the monit
daemon not having refreshed its internal data by the time `sudo monit
status` was issued right after `sudo monit validate`. The stale status
output contained pre-test memory readings which then incorrectly
appeared as the &Azure#34;before test&Azure#34; baseline, making normal memory
usage look like an increase.

#### How did you do it?

- Added `record_monit_baseline_from_validate_output()` to capture the
System-block &Azure#34;data collected&Azure#34; timestamp immediately after `sudo
monit validate`.
- Added `read_monit_status_with_freshness_retry()` to compare the
current monit status timestamp against the saved baseline; if still
stale, sleep and retry (up to 3 times, 60 s each).
- Hooked both functions into `pytest_runtest_setup` and
`pytest_runtest_teardown` in `__init__.py`.
- Only the `monit` command entry uses the freshness-retry path; all
other memory commands (`top`, `free`, `docker stats`, FRR) are
unchanged.

#### How did you verify/test it?

- Manually verified on a VS testbed that
`_parse_monit_memory_data_collected_timestamp` correctly extracts the
System-block timestamp while ignoring Filesystem/Process/Program block
timestamps.
- Unit-tested the retry logic by mocking `execute_command` to return
stale output for the first N calls and fresh output on the final call.

#### Any platform specific information?

The retry wait time (60 s) matches the monit default poll cycle; can be
lowered if the target device uses a shorter cycle.

#### Supported testbed topology if it&Azure#39;s a new test case?

N/A — this is a framework fix for the memory-utilization plugin, not a
new test case.

### Documentation


No documentation update required — this is an internal framework fix.

### Verification

Elastic test jobs for `generic_config_updater` (branch:
`dev/xuliping/20260512_internal-202511_monit-freshness-retry`, image:
`internal-202511`):

| Testbed | Job Link |
|---------|----------|
| testbed-bjw2-can-t0-7260-9 |
https://elastictest.org/scheduler/testplan/6a03157feb4c0d0f5d30bd70 |
| testbed-bjw2-can-t0-7260-1 |
https://elastictest.org/scheduler/testplan/6a031580a907302e5e8240cb |
| testbed-bjw3-can-t0-7060-7 |
https://elastictest.org/scheduler/testplan/6a0315c99f3385605e3ddb9b |
| testbed-bjw3-can-t0-7060-6 |
https://elastictest.org/scheduler/testplan/6a0315c9ea3a02a739d03786 |

12/05/2026 17:35:56 memory_utilization.read_monit_status_wit L0126 INFO
| [MemoryUtilization] status data refreshed on retry 1/3 (System block
ts: Tue, 12 May 2026 17:35:34)

Signed-off-by: xuliping <xuliping@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Liping Xu <108326363+lipxu@users.noreply.github.com>
…tform tests (#24690)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
Summary:
This change increases `config_reload_timeout` from 180s to 240s in
`platform_tests/test_reload_config.py` for Nokia-M0-7215 and Nokia-7215.
The goal is to avoid false test failures caused by longer `config reload
-y` completion time after recent platform specific changes.

test_reload_configuration_checks is failing on Nokia-7215 because
`config reload -y` may not finish within the current 180 sec timeout.
The command is triggered asynchronously and although the handler is
returned the reload flow can still be in progress when the test reaches
its timeout.

In 202511, cherry-pick of PR
sonic-net/sonic-utilities#4390 ( PR
sonic-net/sonic-utilities#4174 in master ) added
extra logic in `_restart_services()` for `armhf-nokia_ixs7215_52x-r0`,
including an explicit 15 sec sleep, swss & syncd stop/reset/restart
operations and management interface recovery handling. This introduces
additional delay on top of the existing config reload sequence which
pushes its completion beyond the current 180 sec timeout.

Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [x] 202511

### Approach

- This is a test timeout adjustment only with scope limited to Nokia
related HWSKUs.
- No functional behavior is changed in config reload itself.
- This only aligns the wait time in the test to match the longer reload
execution path.
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: fountzou <ioannis.fountzoulas@nokia.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: fountzou <169114916+fountzou@users.noreply.github.com>
…and remove stale parametrize-keyed entries (#24673)

Manual cherry-pick of #24626 to 202511 (auto-cherry-pick blocked by
`Cherry Pick Conflict_202511` -- surrounding
`decap/test_subnet_decap.py` entries diverge between master and 202511;
this PR replays only the `decap/test_decap.py`-scoped portion, which is
the entire content of #24626).

Same diff as master commit `19fdea04` (+3 / -63).

## What this changes

1. Adds `'Arista-720DT' in hwsku` to the top-level
`decap/test_decap.py:` skip block. Arista-720DT (TD3-X2 / BCM56873) does
not honor `SAI_TUNNEL_DSCP_MODE_UNIFORM_MODEL`; platform-level
concession in `aristanetworks/sonic-qual.msft#1176`.
2. Removes 8 stale `decap/test_decap.py::test_decap[ttl=*, dscp=*,
vxlan=*]:` entries that have matched zero collected items since PR
#20304 (2025-08-28) refactored `tests/decap/conftest.py` to a single
non-parametrized collection.

## Verification

Empirically verified on `testbed-bjw2-can-720dt-6` (m0,
internal-202511):

- Baseline: `1 error in 8.56s` (`test_decap` proceeds past
conditional_mark, errors at `duthosts` fixture)
- With this patch: `1 skipped, 9 warnings in 2.98s` -- SKIPPED before
any fixture fires

## Related

- Master PR: #24626
- Tracking: `aristanetworks/sonic-qual.msft#1176`
- Refactor that orphaned the parametrize entries: #20304

Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
…4635)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change. -->

Summary:
Fixes # (issue)

Manual cherry-pick of sonic-net/sonic-mgmt#24490

Add testbed specific delays for ACL and Everflow to allow higher wait
for platforms which are slower

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix -->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

Some platforms need longer time for acl programming leading to test
failures. Instead of increasing delay for all platform, move this to the
inventory file, where platform specific delays can be specified. If no
additional delay is specified, it defaults to the current value

Use wait time specified in inventory if available

Run acl and everfow test suite

<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation? Link
to the wiki page?
-->

---------

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: Tejaswini Chadaga <tchadaga@microsoft.com>
@lizhijianrd lizhijianrd changed the title Code sync 202511 to 202603 20260519 [code sync] Merge code from sonic-net/sonic-mgmt:202511 to 202603 May 19, 2026
@lizhijianrd lizhijianrd merged commit 21173d2 into Azure:202603 May 19, 2026
3 checks passed
@lizhijianrd lizhijianrd deleted the code-sync-202511-to-202603-20260519 branch May 19, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.