Skip to content

vault: backport instrumentation and observation blob parallelization to 2.39.2#21762

Merged
prashantkumar1982 merged 15 commits intorelease/2.39.2from
codex/vault-release-2.39.2
Mar 28, 2026
Merged

vault: backport instrumentation and observation blob parallelization to 2.39.2#21762
prashantkumar1982 merged 15 commits intorelease/2.39.2from
codex/vault-release-2.39.2

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

Summary

This backports the Vault instrumentation and Observation blob broadcast parallelization changes to release/2.39.2.

What changed

  • add KV store operation duration metrics to the Vault plugin
  • instrument the KV and blob broadcaster/fetcher wrappers
  • parallelize pending-queue blob broadcasts during Observation()
  • add debug logs for overall observation duration and blob broadcast duration
  • keep focused unit coverage for the parallel broadcast path and pending-queue validation
  • include small release-branch import adjustments needed for this backport

Testing

  • go test ./core/services/ocr2/plugins/vault -run 'TestPlugin_Observation_PendingQueueEnabled_(NoPendingQueueProvided|WithPendingQueueProvided|ItemBothInPendingQueueAndLocalQueue|BroadcastsPendingQueueBlobsInParallel|BroadcastBlobError)$|^TestPlugin_ValidateObservation_AcceptsFullPendingQueueObservation$' -count=1

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 27, 2026

✅ No conflicts with other open PRs targeting release/2.39.2

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc7813b203

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

batchSize.DefaultValue = int(configProto.BatchSize)
}
cfg.MaxBatchSize, err = limits.MakeUpperBoundLimiter(r.limitsFactory, batchSize)
cfg.MaxBatchSize, err = limits.MakeUpperBoundLimiter(r.limitsFactory, cresettings.Default.VaultPluginBatchSizeLimit)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply offchain BatchSize when creating MaxBatchSize limiter

NewReportingPlugin still parses configProto, but MaxBatchSize is now always built from cresettings.Default.VaultPluginBatchSizeLimit and never from configProto.BatchSize. This silently ignores operator-provided batch-size overrides and can change runtime request/pending-queue limits from what the DON config specifies, which can alter queue truncation/validation behavior in production without any configuration error.

Useful? React with 👍 / 👎.

@trunk-io
Copy link
Copy Markdown

trunk-io bot commented Mar 27, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@prashantkumar1982 prashantkumar1982 enabled auto-merge (squash) March 27, 2026 23:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa7c47392f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +498 to +500
for i, payload := range blobPayloads {
g.Go(func() error {
blobHandle, ierr2 := blobBroadcastFetcher.BroadcastBlob(broadcastCtx, payload, ocr3_1types.BlobExpirationHintSequenceNumber{SeqNr: seqNr + 2})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail fast before launching all blob broadcasts

This loop starts a goroutine for every pending payload before any error is observed, and errors are only handled at g.Wait(). In practice, if one BroadcastBlob call fails while others succeed, Observation still returns an error (so the round retries) but many uploads from that failed round may already have been persisted; on blob backends that account each upload against unexpired blob quotas, this regression can quickly consume per-oracle blob capacity and lead to repeated round failures.

Useful? React with 👍 / 👎.

@cl-sonarqube-production
Copy link
Copy Markdown

@prashantkumar1982 prashantkumar1982 merged commit d94f5b7 into release/2.39.2 Mar 28, 2026
212 checks passed
@prashantkumar1982 prashantkumar1982 deleted the codex/vault-release-2.39.2 branch March 28, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants