Skip to content

[fpmsyncd] Fpmsyncd Next Hop Table Enhancement#2919

Merged
prsunny merged 18 commits into
sonic-net:masterfrom
ntt-omw:ntt_fpmsyncd_enhanced
Feb 7, 2025
Merged

[fpmsyncd] Fpmsyncd Next Hop Table Enhancement#2919
prsunny merged 18 commits into
sonic-net:masterfrom
ntt-omw:ntt_fpmsyncd_enhanced

Conversation

@ntt-omw
Copy link
Copy Markdown
Contributor

@ntt-omw ntt-omw commented Oct 2, 2023

What I did
Implementing code changes for sonic-net/SONiC#1425

Why I did it
add nexthop group feature to fpmsyncd.

How I verified it

enable/disable nexthop group feature

  • Klish will call REST API to configure feature next-hop-group enable.
  • FEATURE|nexthop_group will be created in CONFIG_DB
    • template zebra.conf.j2 will generate zebra.conf with fpm use-next-hop-groups if FEATURE|nexthop_group exists in CONFIG_DB. Else, it will generate zebra.conf with no fpm use-next-hop-groups (default behavior)
  • Do config save comman and write to /etc/sonic/config_db.json
  • restart SONiC: virsh reboot sonic-nhg
  • /etc/frr/zebra.conf has fpm use-next-hop-groups instead of no fpm use-next-hop-groups

Klish CLI for feature nexthop_group

  • Enable: sonic(config)# feature next-hop-group enable
  • Disable: sonic(config)# no feature next-hop-group

Enable

admin@sonic:~$ sonic-cli

sonic# configure terminal

sonic(config)#
  end        Exit to EXEC mode
  exit       Exit from current mode
  feature    Configure additional feature
  interface  Select an interface
  ip         Global IP configuration subcommands
  mclag      domain
  no         To delete / disable commands in config mode

sonic(config)# feature
  next-hop-group  Next-hop Groups feature

sonic(config)# feature next-hop-group
  enable  Enable Next-hop Groups feature

sonic(config)# feature next-hop-group enable
  <cr>

Disable

sonic(config)# no
  feature  Disable additional feature
  ip       Global IP configuration subcommands
  mclag    domain

sonic(config)# no feature
  next-hop-group  Disable Next-hop Groups feature

sonic(config)# no feature next-hop-group
  <cr>

Details if related

Comment thread fpmsyncd/routesync.cpp Outdated
FieldValueTuple nhg("nexthop_group", nhg_id_key.c_str());
fvVector.push_back(nhg);
updateNextHopGroup(nhg_id);
use_nhg = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_nhg=true?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed after the change to not use NHG for route with single nexthop.

Comment thread fpmsyncd/routesync.h Outdated
/* nexthop group table */
ProducerStateTable m_nexthop_groupTable;
map<uint32_t,NextHopGroup> m_nh_groups;
map<string,NextHopGroupRoute> m_nh_routes;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fpmsyncd should NOT cache the routes, it will consume too much memory in large scale routes scenario.
For one route entry, it already exists in orchagent, sai meta, syncd(sai/sdk). We cannot afford another copying :(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed map<string,NextHopGroupRoute> m_nh_routes; based on the discussion in the Routing WG.
Please kindly check if this has resolved your concern.

@nakano-omw
Copy link
Copy Markdown

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 2919 in repo sonic-net/sonic-swss

@nakano-omw
Copy link
Copy Markdown

@zice312963205 @shuaishang
Please help in restarting the /AzurePipelines run Azure.sonic-swss. Would you be able to grant me the privilege?

@ridahanif96
Copy link
Copy Markdown
Contributor

@nakano-omw your branch needs to be updated and you need to repush your code first. @prsunny Hi Prince, can you please help with this PR to merge. Thanks.

@nakano-omw
Copy link
Copy Markdown

@ridahanif96 I have repush. Thanks.

@zice312963205
Copy link
Copy Markdown

Looks good

@zhangyanzhao
Copy link
Copy Markdown

reviewers, can you please help to review and approve this PR? Thanks.

Copy link
Copy Markdown
Collaborator

@dgsudharsan dgsudharsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add UT to the changes. There is a requirement for 80% coverage.

Comment thread fpmsyncd/routesync.h Outdated
#include <netlink/route/route.h>

#if (LINUX_VERSION_CODE > KERNEL_VERSION(5,3,0))
#define HAVE_NEXTHOP_GROUP
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this macro HAVE_NEXTHOP_GROUP? Since we are submitting the code to master, the linux version is expected to be above 5,3,0. Please remove unnecessary ifdefs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have removed the code.

Comment thread fpmsyncd/routesync.cpp Outdated
else
#endif
{
onEvpnRouteMsg(h, len);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested if nexthop group works with EVPN in case of overlay nexthop?

Comment thread fpmsyncd/routesync.cpp Outdated
char ifname_unknown[IFNAMSIZ] = "unknown";

SWSS_LOG_INFO("type %d len %d", nlmsg_type, len);
if ((nlmsg_type != RTM_NEWNEXTHOP)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked in the calling function and hence redundant. Please remove here
https://github.com/sonic-net/sonic-swss/pull/2919/files#diff-0555c0a4f1e207c410ac8ab7d4a44f48a0925da2ed14c57499a4e9175223be57R625

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have removed the code.

Comment thread fpmsyncd/routesync.cpp Outdated

#define ETHER_ADDR_STRLEN (3*ETH_ALEN)

#define MULTIPATH_NUM 256 //Same value used for FRR in SONiC
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename as MAX_MULTIPATH_NUM for better readability?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it to MAX_MULTIPATH_NUM.

Comment thread fpmsyncd/routesync.cpp Outdated
auto itr = m_nh_groups.find(id);
if(itr == m_nh_groups.end())
{
SWSS_LOG_INFO("NextHop group is incomplete: %d", nhg.id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a warn or error log?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected it to SWSS_LOG_ERROR.

Comment thread fpmsyncd/routesync.cpp Outdated
auto git = m_nh_groups.find(nh_id);
if(git == m_nh_groups.end())
{
SWSS_LOG_INFO("Nexthop not found: %d", nh_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a warn or error message?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected it to SWSS_LOG_ERROR.

Copy link
Copy Markdown
Contributor

@eddieruan-alibaba eddieruan-alibaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cherry picked into Phoenix Wing folk and validate it.

http://phoenixwing.com.cn/codes

Comment thread fpmsyncd/fpmlink.cpp Outdated
/* EVPN Type5 Add route processing */
processRawMsg(nl_hdr);
}
#ifdef HAVE_NEXTHOP_GROUP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this macro check? Can we do a dynamic check like

DEVICE_METADATA['localhost']['nexthop_group']

This is done as part of PR - https://github.com/sonic-net/sonic-buildimage/pull/16762/files

Could you pls check this?

Comment thread fpmsyncd/routesync.cpp Outdated

vector<string> alsv = tokenize(intf_list, NHG_DELIMITER);
for (auto alias : alsv)
#ifdef HAVE_NEXTHOP_GROUP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the previous comment and we could use device_metadata dynamic check.

Comment thread fpmsyncd/routesync.cpp
* up/down events. Skipping routes to eth0 or docker0 to avoid such behavior
*/
if (alias == "eth0" || alias == "docker0")
const auto itg = m_nh_groups.find(nhg_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add couple of sonic-swss tests with NHGs?

@kperumalbfn
Copy link
Copy Markdown
Contributor

@ntt-omw Please check the compiler failures

routesync.cpp:800:23: error: 'rtnl_route_get_nh_id' was not declared in this scope; did you mean 'rtnl_route_get_iif'?
800 | uint32_t nhg_id = rtnl_route_get_nh_id(route_obj);
| ^~~~~~~~~~~~~~~~~~~~
| rtnl_route_get_iif

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

@ntt-omw can you rebase your branch and trigger recompile? You need #3105 's changes to fix the compile issue @kperumalbfn pointed out.

Comment thread fpmsyncd/routesync.cpp Outdated
// In this case since we do not want the route with next hop on eth0/docker0, we return.
// But still we need to clear the route from the APPL_DB. Otherwise the APPL_DB and data
// path will be left with stale route entry
if(alsv.size() == 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this test be moved outside the loop ? If the list is single entry then there is no reason for the loop itself.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this loop is not to show skipped routes, but to skip routes to specific interfaces (eth0 or docker0) and do the associated processing.

Comment thread fpmsyncd/routesync.cpp Outdated
string weights = getNextHopWt(route_obj);

vector<string> alsv = tokenize(intf_list, NHG_DELIMITER);
for (auto alias : alsv)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this loop? Is it to print the skipped routes ? Because the only logic there is when the list is of size 1.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I moved if(alsv.size() == 1)) outside the loop.

@nakano-omw
Copy link
Copy Markdown

@dgsudharsan @kperumalbfn
The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

@dgsudharsan @kperumalbfn The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@ntt-omw can you help to get swss sanity check passed?

Failed: 3 (0.35%)

test_rebind_eni_route_group
AssertionError: Field SAI_ENI_ATTR_OUTBOUND_ROUTING_GROUP_ID in ASIC_DB table ASIC_STATE:SAI_OBJECT_TYPE_ENI not equal to oid:0x14008000000617
https://github.com/sonic-net/sonic-swss/pull/2919/checks?check_run_id=31324424340

Might be related to your changes.

@saiarcot895
Copy link
Copy Markdown
Contributor

@nakano-omw libnl 3.10 will have support for getting/setting the nexthop ID attribute, but the API is a little bit different. See thom311/libnl@3e08063 for details. It looks like in the version of code that has been committed, it's nhid instead of nh_id.

For ease of upgrades, it would be good if the same API syntax is used. Would you be able to rework this PR to use that new API instead?

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch"

@sugisono-omw
Copy link
Copy Markdown

@eddieruan-alibaba

We are fixing this issue.

@dgsudharsan @kperumalbfn The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@ntt-omw can you help to get swss sanity check passed?

Failed: 3 (0.35%)

test_rebind_eni_route_group AssertionError: Field SAI_ENI_ATTR_OUTBOUND_ROUTING_GROUP_ID in ASIC_DB table ASIC_STATE:SAI_OBJECT_TYPE_ENI not equal to oid:0x14008000000617 https://github.com/sonic-net/sonic-swss/pull/2919/checks?check_run_id=31324424340

Might be related to your changes.

@sugisono-omw
Copy link
Copy Markdown

@eddieruan-alibaba

We are rebasing our branch now.

@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch"

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

Following lines are missing test coverage.. Coverage Threshold is 80%.

Comment thread fpmsyncd/routesync.cpp
{
if(nhg.group.size() == 0)
{
if(!nhg.nexthop.empty())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with the following:
nexthops = nhg.nexthop.empty() ? (af == AF_INET ? "0.0.0.0" : "::") : nhg.nexthop;

Similar logic is implemented in the non-empty nhg.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic with one line

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread fpmsyncd/routesync.cpp Outdated
//Using route-table only for single next-hop
string nexthops, ifnames, weights;

getNextHopGroupFields(nhg, nexthops, ifnames, weights, rtnl_route_get_family(route_obj));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to handle a case where the nhg is not based on the ID?
If there is a failure in getNextHopGroupFields(), you would be pushing empty strings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need FRR to provide NHG ID. Otherwise, the NHG key would be changed during path changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As eddie says, FRR is needed to provide NHG IDs. There is no case where nhg is not based on identity,

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
@nakano-omw nakano-omw force-pushed the ntt_fpmsyncd_enhanced branch from 15b8a6d to f017e8c Compare January 29, 2025 10:46
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@nakano-omw
Copy link
Copy Markdown

@dgsudharsan The code has been corrected.

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

@dgsudharsan can you help to review it again? @nakano-omw has already addressed your comments. @nakano-omw can you change the Sudharsan's review status to request for review? Currently, it is still "requested changes"

Comment thread fpmsyncd/routesync.cpp Outdated
}
}
}
if (grp_count > 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is still not addressed. Not sure why can't we move this code to loop in line 1874 like below?

image

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dgsudharsan
Copy link
Copy Markdown
Collaborator

@eddieruan-alibaba Still one of my comments is not addressed. Left a comment on how to address that

@eddieruan-alibaba
Copy link
Copy Markdown
Contributor

Thanks @dgsudharsan . @nakano-omw do you want to take a look again?

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@nakano-omw nakano-omw force-pushed the ntt_fpmsyncd_enhanced branch from efb6b50 to a748926 Compare February 7, 2025 04:39
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

dgsudharsan
dgsudharsan previously approved these changes Feb 7, 2025
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 515af60 into sonic-net:master Feb 7, 2025
baorliu pushed a commit to baorliu/sonic-swss that referenced this pull request Feb 23, 2026
What I did
Implementing code changes for sonic-net/SONiC#1425

Why I did it
add nexthop group feature to fpmsyncd.

How I verified it

enable/disable nexthop group feature
Klish will call REST API to configure feature next-hop-group enable.
FEATURE|nexthop_group will be created in CONFIG_DB
template zebra.conf.j2 will generate zebra.conf with fpm use-next-hop-groups if FEATURE|nexthop_group exists in CONFIG_DB. Else, it will generate zebra.conf with no fpm use-next-hop-groups (default behavior)
Do config save comman and write to /etc/sonic/config_db.json
restart SONiC: virsh reboot sonic-nhg
/etc/frr/zebra.conf has fpm use-next-hop-groups instead of no fpm use-next-hop-groups

Signed-off-by: Baorong Liu <96146196+baorliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.