Skip to content

Multi-asic support for config reload #212

@agadia-cisco

Description

@agadia-cisco
  • Found this issue, while fixing Issue - Multi-asic support for ApplyPatchDb API for gNMI

  • gnoi_reboot happens in test_gnmi_configdb.py::test_gnmi_configdb_full_01. gnoi_reboot runs fine in single-asic devices & in case of multi-asic before gnmi_set is run in the test_gnmi_configdb.py::test_gnmi_configdb_full_01 function. After gnmi_set happening here which entirely updates the config_db, gnoi_reboot fails.

  • Reason, gnoi_reboot is failing in multi-asic devices is because as per Config reload in Multi-Asic documentation for any successful config reload to take place after a change in configuration we need to pass all the different config_db that exist;

    before gnmi_set takes place in the TC, if we run the config_reload or gnoi_reboot (which underneath is doing a config_reload), then since there has been no changes in configuration, it picks up the asic specific config_db and reloads successfully; BUT, when we run gnmi_set in TC & do a gnmi_reboot then by default currently gnoi_reboot is picking up the temporary config_db that is written by gnmi_set viz stored at /tmp/config_db.json.tmp, but since we didn't pass other asic specifc config_dbs, gnoi_reboot fails with error -

    System Reboot
    panic: rpc error: code = Unknown desc = 
    
    goroutine 1 [running]:
    main.systemReboot({0xabe858, 0xc000226a10}, {0xabb698, 0xc00014f380})
    	/sonic/src/sonic-gnmi/gnoi_client/gnoi_client.go:135 +0x1a8
    main.main()
    	/sonic/src/sonic-gnmi/gnoi_client/gnoi_client.go:55 +0x625
    
  • Do debug further, we manually tried doing config_reload on multi-asic device, first we tried a normal config reload without passing new config_db file - config reload -y & config_reload was successful.

  • Then we tried a config_reload passing the full.txt which was passed as payload to gnmi_set in the TC here, and it errored out -

    # config reload -y /home/cisco/full.txt 
    Acquired lock on /etc/sonic/reload.lock
    Input file /home/cisco/full.txt must contain all asics config. ns_list: ['', 'asic0', 'asic1', 'asic2'] file ns_list: ['ACL_RULE', 'ACL_TABLE', 'ASIC_SENSORS', 'AUTO_TECHSUPPORT', 'AUTO_TECHSUPPORT_FEATURE', 'BANNER_MESSAGE', 'BGP_DEVICE_GLOBAL', 'BGP_INTERNAL_NEIGHBOR', 'BGP_NEIGHBOR', 'BUFFER_PG', 'BUFFER_POOL', 'BUFFER_PROFILE', 'BUFFER_QUEUE', 'CABLE_LENGTH', 'CONSOLE_SWITCH', 'CRM', 'DEVICE_METADATA', 'DEVICE_NEIGHBOR', 'DEVICE_NEIGHBOR_METADATA', 'DSCP_TO_TC_MAP', 'FEATURE', 'FLEX_COUNTER_TABLE', 'KDUMP', 'LOGGER', 'LOOPBACK_INTERFACE', 'MAP_PFC_PRIORITY_TO_QUEUE', 'MGMT_INTERFACE', 'MGMT_PORT', 'NTP', 'PASSW_HARDENING', 'PORT', 'PORTCHANNEL', 'PORTCHANNEL_INTERFACE', 'PORTCHANNEL_MEMBER', 'PORT_QOS_MAP', 'QUEUE', 'RESTAPI', 'SCHEDULER', 'STATIC_ROUTE', 'SYSLOG_CONFIG', 'SYSLOG_CONFIG_FEATURE', 'SYSTEM_DEFAULTS', 'TC_TO_PRIORITY_GROUP_MAP', 'TC_TO_QUEUE_MAP', 'TELEMETRY', 'VERSIONS', 'WRED_PROFILE']
    Released lock on /etc/sonic/reload.lock
    Aborted!
    
  • But then, we tried a config_reload passing all the asic specific config_db's along with the full.txt & it ran fine as shown below & when we checked value of admin_status for Ethernet0 interface it was down as expected (since that was the update being done as part of TC)

    # config reload -y /etc/sonic/config_db.json,/home/cisco/full.txt,/etc/sonic/config_db1.json,/etc/sonic/config_db2.json
    Acquired lock on /etc/sonic/reload.lock
    Disabling container and routeCheck monitoring ...
    Stopping SONiC target ...
    Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
    Running command: /usr/local/bin/db_migrator.py -o migrate
    Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /home/cisco/full.txt -n asic0 --write-to-db
    Running command: /usr/local/bin/db_migrator.py -o migrate -n asic0
    Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db1.json -n asic1 --write-to-db
    Running command: /usr/local/bin/db_migrator.py -o migrate -n asic1
    Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db2.json -n asic2 --write-to-db
    Running command: /usr/local/bin/db_migrator.py -o migrate -n asic2
    Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
    Restarting SONiC target ...
    Enabling container and routeCheck monitoring ...
    Reloading Monit configuration ...
    Reinitializing monit daemon
    Released lock on /etc/sonic/reload.lock
    
    # show int status
          Interface                            Lanes    Speed    MTU    FEC        Alias             Vlan    Oper    Admin             Type    Asym PFC
    ---------------  -------------------------------  -------  -----  -----  -----------  ---------------  ------  -------  ---------------  ----------
          Ethernet0              1288,1289,1290,1291     100G   9100     rs   Eth0/0/0/0   PortChannel102    down     down  QSFP28 or later         off
    ....
    
  • Therefore, we can conclude, that we need to enable Multi-Asic support for reload func as well in gnoi_client so that when gnoi_reboot is triggered in Multi-Asic device after a change, then it reboots successfully.

  • Ref :

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions