Expand /debug/ready with granular BPF and XDS status#1702
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the operational observability of the mesh by transforming the /debug/ready endpoint from a simple health check into a detailed diagnostic tool. By exposing granular state information for BPF components and XDS stream connectivity, it enables easier debugging of partial failures and provides necessary data for external monitoring systems and UI integrations like Headlamp. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. The mesh is quiet, deep and wide, With BPF and XDS inside. We check the state, we watch the flow, So readiness is all we know. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements a comprehensive readiness probe by adding status tracking and IsReady methods to the BPF loader and XDS controllers. It also introduces thread safety improvements across the ADS and workload controllers using mutexes and atomic booleans. A critical compilation error was identified in pkg/controller/client.go where GetXdsStreamStability returns a single string, but its caller in the status server expects two separate values for reconnect count and connection time.
f9c5a44 to
62bc8f7
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (28.07%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
62bc8f7 to
dfa83b6
Compare
…tatus Expand the readiness probe endpoint to return structured JSON with per-component health details for BPF programs/maps and XDS connection state, replacing the previous plain-text OK response. Changes: - Add BpfStatus struct with per-program and per-map health reporting - Add XDS connection state, controller readiness, and reconnect metadata - Move gRPC dial outside mutex in createGrpcStreamClient to avoid blocking readiness readers during slow reconnects - Add thread-safe IsReady/GetBpfStatus to BpfLoader - Add IsReady/GetGrpcState/GetControllerStatus/GetXdsStreamStability to XdsClient with RWMutex protection - Add mutex and atomic.Bool to ADS and Workload controllers for thread-safe stream access and initialization tracking - Add readiness probe tests covering nil state, JSON structure, XDS metadata, and nested field validation Signed-off-by: vanshika2720 <pahalvanshikaa@gmail.com>
dfa83b6 to
0a5e891
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it
This PR expands the:
endpoint to provide granular health visibility for:
These enhancements improve operational observability and enable:
in the Headlamp plugin.
This allows users to verify:
Previously, the readiness endpoint only exposed coarse readiness state, making it difficult to diagnose partial failures or unstable control plane connectivity.
Key changes
Granular BPF status reporting
Expanded
BpfLoaderreadiness reporting to include:This improves low-level dataplane observability.
XDS stream stability tracking
Added thread-safe XDS connection stability tracking in:
XdsClientincluding:
This provides better visibility into:
Expanded readiness response
Enhanced the JSON payload returned by:
to expose detailed component-level readiness information for:
This makes the endpoint more useful for:
Controller readiness integration
Integrated readiness checks into:
AdsControllerWorkloadControllerto provide centralized readiness reporting across core mesh components.
Which issue(s) this PR fixes
Fixes #
Special notes for your reviewer
Thread safety
Introduced:
in:
XdsClientto ensure safe concurrent access during readiness and status reporting.
Test updates
Updated:
to validate the new granular readiness response format.
Formatting
Applied:
to all modified files.
Why this matters
These changes improve:
Users can now identify:
without relying on logs or deep internal debugging.
Does this PR introduce a user-facing change?