Skip to content

vault: gracefully handle individual blob broadcast failures in Observation#21765

Open
prashantkumar1982 wants to merge 2 commits intodevelopfrom
vault/graceful-blob-broadcast-failures
Open

vault: gracefully handle individual blob broadcast failures in Observation#21765
prashantkumar1982 wants to merge 2 commits intodevelopfrom
vault/graceful-blob-broadcast-failures

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

@prashantkumar1982 prashantkumar1982 commented Mar 28, 2026

Summary

During the Observation phase, pending queue payloads are broadcast as blobs in parallel. Previously, if any single broadcast failed (transient network error, malformed data, etc.), the entire observation was aborted — no payloads were included, and the OCR round stalled.

This changes the behavior so that individual failures are isolated: a failed broadcast is logged as a warning (with the request ID and error) and that payload is excluded from PendingQueueItems. All remaining payloads continue to be broadcast and observed normally.

What changed

  • New behavior: A single blob broadcast failure no longer aborts the whole observation. The failed item is skipped, a warning is logged, and the rest proceed.
  • Refactor: The parallel broadcast logic is extracted into a broadcastBlobPayloads method for readability. It accepts payloads and request IDs, runs broadcasts concurrently, and returns only the successfully broadcast blob handles.

Why

The observation step is critical to OCR round progress. Aborting it entirely because one out of N payloads hit a transient failure is disproportionate — especially since the failed payload can simply be retried in a future round. Graceful degradation keeps rounds moving and avoids cascading stalls.

…ation

Previously, if any single payload failed to broadcast as a blob during the
Observation phase, the entire observation was aborted and returned an error.
This is unnecessarily disruptive — one problematic payload (e.g. transient
network issue, malformed data) would prevent all other valid payloads from
being included in the observation, stalling the OCR round.

Now, individual broadcast failures are logged as warnings (with the request
ID and error details) and the failed payload is simply excluded from
PendingQueueItems. The remaining payloads continue to be broadcast and
observed normally.

The blob broadcast logic is extracted into a dedicated
broadcastBlobPayloads method for clarity.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

👋 prashantkumar1982, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 28, 2026

✅ No conflicts with other open PRs targeting develop

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c16097773c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Check ctx.Err() when BroadcastBlob fails so that context.Canceled and
context.DeadlineExceeded are returned immediately rather than swallowed.
This preserves fail-fast semantics for expired OCR rounds while still
skipping item-specific transient errors.

Made-with: Cursor
@trunk-io
Copy link
Copy Markdown

trunk-io bot commented Mar 28, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
TestCCIPReader_Nonces The test failed due to an unexpected 'replacement transaction underpriced' error during execution. Logs ↗︎

View Full Report ↗︎Docs

@cl-sonarqube-production
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant