Skip to content

MixedDbClient: batch ZMQ Set/Del for multi-key gNMI updates (#27250)#675

Open
hdwhdw wants to merge 1 commit into
sonic-net:masterfrom
hdwhdw:daweihuang/zmq-batched-set
Open

MixedDbClient: batch ZMQ Set/Del for multi-key gNMI updates (#27250)#675
hdwhdw wants to merge 1 commit into
sonic-net:masterfrom
hdwhdw:daweihuang/zmq-batched-set

Conversation

@hdwhdw
Copy link
Copy Markdown
Contributor

@hdwhdw hdwhdw commented May 14, 2026

Why

handleTableData in MixedDbClient (under sonic_data_client/mixed_db_client.go) currently loops per key when servicing a gNMI Set:

  • opRemove → calls DbDelTable per dbkey.
  • multi-key JSON opAdd → calls DbSetTable per JSON top-level key.

For DASH route programming this is the hot path — a single gNMI Set can carry tens of thousands of keys, and we generate that many ZMQ round-trips even though they all target the same producer on the same DPU. ZmqProducerStateTable already has a batched set(vector<KeyOpFieldsValuesTuple>) / del(vector<string>) in sonic-swss-common that fires one ZMQ message regardless of N — but the Go bindings did not expose a callable form (see the dependency below).

What

  • New DbBatchSetTable(table string, entries []tableBatchEntry) error and DbBatchDelTable(table string, keys []string) error on MixedDbClient. For ZMQ-backed tables (DASH_* minus DASH_HA_*) they assemble parallel VectorString / FieldValuePairsList and call the new swsscommon.ZmqProducerBatchedSet / swsscommon.ZmqProducerBatchedDel helpers under RetryHelper. For plain Table and non-ZMQ ProducerStateTable backends — where there is no batched C++ API — they fall back to the existing per-entry loop, so behaviour for DASH_HA_* and ordinary CONFIG_DB writes is unchanged.
  • handleTableData:
    • opRemove builds a key slice and makes a single DbBatchDelTable call.
    • multi-key JSON opAdd builds an entry slice and makes a single DbBatchSetTable call.
    • Single-key paths (1138, 1152) are untouched — batching one entry is pointless.
  • ZmqProducerBatchedSetWrapper / ZmqProducerBatchedDelWrapper wrap the new SWIG entry points to convert C++ exceptions into Go errors (same pattern as the existing ProducerStateTable…Wrappers).

Tests

sonic_data_client/client_test.go gains:

  • TestZmqBatchedSetEndToEnd / TestZmqBatchedDelEndToEnd — round-trip 5 entries through a real ZmqProducerStateTable + ZmqConsumerStateTable and verify the consumer drains them all.
  • TestDbBatchSetTableFallsBackForPlainTable / TestDbBatchSetTableEmptyAndSingle — verify the plainTableMap fallback and the no-op/single-entry shortcuts.
  • TestHandleTableDataBatchesMultiKeyJSON / TestHandleTableDataBatchesOpRemove — gomonkey-patched proof that handleTableData routes multi-key payloads through DbBatchSetTable / DbBatchDelTable exactly once (not N times).

Related

CI on this branch will fail to build until sonic-swss-common#1188 lands and a new libswsscommon artifact is published. The compile errors will be on the swsscommon.ZmqProducerBatched{Set,Del} symbols.

Out of scope (deliberate)

  • Cross-update batching at the SetDB level — would change per-update atomicity semantics.
  • Auto-chunking very large batches into multiple ZMQ messages — can be a follow-up if 100k+ in one message proves problematic on the orchagent side.
  • ZmqProducerBatchedSend (mixed SET/DEL in one message) — exported by the swss-common PR for completeness but no gNMI caller needs it today.

Why:
The current handleTableData loops DbSetTable/DbDelTable per key, so a
single gNMI Set carrying N keys to a DASH_* table generates N ZMQ
round-trips even though the messages all target the same producer.
At the scale DASH route programming requires (10k-100k entries per
update), the per-key overhead dominates wall-clock time.

What:
* Add DbBatchSetTable(table, []tableBatchEntry) and DbBatchDelTable(
  table, []string) on MixedDbClient. When the table is ZMQ-backed
  these collapse the batch into a single ZmqProducerBatchedSet /
  ZmqProducerBatchedDel call (one ZMQ message). For plain Table or
  non-ZMQ ProducerStateTable backends — where no batched C++ API
  exists — they fall back to the existing per-entry loop, so behavior
  for DASH_HA_* tables and ordinary CONFIG_DB writes is unchanged.

* Wire the multi-key code paths in handleTableData to the new
  helpers: the opRemove branch and the multi-key JSON opAdd branch
  now build a slice and make one batched call. Single-key paths are
  untouched (batching one entry is pointless).

* Wrap the new SWIG entry points with ZmqProducerBatched{Set,Del}Wrapper
  to convert C++ exceptions into Go errors, mirroring the existing
  ProducerStateTable wrappers.

Tests:
client_test.go adds:
* TestZmqBatchedSetEndToEnd / TestZmqBatchedDelEndToEnd — round-trip
  N entries through a real ZmqProducerStateTable + ZmqConsumerStateTable
  and verify the consumer drains them all.
* TestDbBatchSetTableFallsBackForPlainTable / ...EmptyAndSingle —
  verify the plainTableMap fallback path and empty/single-entry
  shortcuts.
* TestHandleTableDataBatchesMultiKeyJSON / ...OpRemove — gomonkey-
  patched proof that handleTableData routes through DbBatchSetTable /
  DbBatchDelTable exactly once for multi-key payloads instead of N
  per-key calls.

Related:
* sonic-net/sonic-buildimage#27250 — tracking issue for the
  optimization.
* Depends on sonic-net/sonic-swss-common#1188, which exposes
  ZmqProducerBatchedSet/Del/Send to Go via SWIG %inline helpers.
  Until that PR lands and a fresh libswsscommon artifact is
  published, sonic-gnmi CI on this branch will fail to build — the
  Go bindings for the new symbols do not exist in the current
  master swsscommon artifact.

Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
Copilot AI review requested due to automatic review settings May 14, 2026 16:25
@mssonicbld
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the gNMI Set hot path in MixedDbClient by batching multi-key ZMQ-backed table writes/deletes into a single ZMQ message, reducing per-key round-trips for large DASH updates.

Changes:

  • Add DbBatchSetTable / DbBatchDelTable to batch SET/DEL operations for ZMQ-backed tables with a fallback to per-entry loops for non-ZMQ backends.
  • Update handleTableData to route multi-key JSON opAdd and multi-key opRemove through the new batch APIs.
  • Add end-to-end and routing tests covering batched ZMQ set/del, fallback behavior, and handleTableData batching behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
sonic_data_client/mixed_db_client.go Adds batched ZMQ Set/Del wrappers and batch APIs; updates handleTableData to use batching for multi-key updates.
sonic_data_client/client_test.go Adds new tests validating batched ZMQ behavior and that handleTableData calls batch APIs exactly once.

Comment on lines +2338 to +2346
// handleTableData looks up RedisDbMap[<mapkey>:<dbName>]; ensure a
// redis client is registered for the key.
mapKey := "test:" + APPL_DB_NAME
if _, ok := RedisDbMap[mapKey]; !ok {
RedisDbMap[mapKey] = redis.NewClient(&redis.Options{
Addr: "127.0.0.1:6379",
DB: 0,
})
}
Comment on lines +2396 to +2407
if _, ok := RedisDbMap[mapKey]; !ok {
RedisDbMap[mapKey] = redis.NewClient(&redis.Options{
Addr: "127.0.0.1:6379",
DB: 0,
})
}

// Seed three keys in Redis so the wildcard Keys() lookup returns
// multiple dbkeys.
const tableName = "DASH_BATCH_REMOVE_TABLE"
const delim = ":"
rdb := RedisDbMap[mapKey]
Comment on lines +2172 to +2173
for _, c := range zmqClientMap {
swsscommon.DeleteZmqClient(c)
Comment on lines +2211 to +2212
for _, c := range zmqClientMap {
swsscommon.DeleteZmqClient(c)
@prabhataravind
Copy link
Copy Markdown

@hdwhdw could you check PR failures and copilot comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants