Skip to content

Expand /debug/ready with granular BPF and XDS status#1702

Open
vanshika2720 wants to merge 1 commit into
kmesh-net:mainfrom
vanshika2720:feat/granular-debug-ready
Open

Expand /debug/ready with granular BPF and XDS status#1702
vanshika2720 wants to merge 1 commit into
kmesh-net:mainfrom
vanshika2720:feat/granular-debug-ready

Conversation

@vanshika2720
Copy link
Copy Markdown

What type of PR is this?

/kind feature


What this PR does / why we need it

This PR expands the:

/debug/ready

endpoint to provide granular health visibility for:

  • eBPF programs
  • eBPF maps
  • XDS stream stability

These enhancements improve operational observability and enable:

Visual indicators of mesh status

in the Headlamp plugin.

This allows users to verify:

  • Whether BPF programs are correctly attached
  • Whether required BPF maps are healthy
  • Whether the XDS control plane connection is stable

Previously, the readiness endpoint only exposed coarse readiness state, making it difficult to diagnose partial failures or unstable control plane connectivity.


Key changes

Granular BPF status reporting

Expanded BpfLoader readiness reporting to include:

  • Individual eBPF program attachment status
  • eBPF map readiness information
  • Detailed component-level health visibility

This improves low-level dataplane observability.


XDS stream stability tracking

Added thread-safe XDS connection stability tracking in:

XdsClient

including:

  • Reconnect counts
  • Last successful connect time
  • Stream stability metadata

This provides better visibility into:

  • Control plane health
  • ADS stream reliability
  • Reconnection behavior

Expanded readiness response

Enhanced the JSON payload returned by:

/debug/ready

to expose detailed component-level readiness information for:

  • BPF programs
  • Maps
  • XDS connectivity
  • Controller readiness

This makes the endpoint more useful for:

  • Headlamp UI integration
  • Monitoring systems
  • Operational debugging

Controller readiness integration

Integrated readiness checks into:

  • AdsController
  • WorkloadController

to provide centralized readiness reporting across core mesh components.


Which issue(s) this PR fixes

Fixes #

(Please add the issue number here if applicable)

Special notes for your reviewer

Thread safety

Introduced:

sync.RWMutex

in:

  • XdsClient
  • Controllers

to ensure safe concurrent access during readiness and status reporting.


Test updates

Updated:

pkg/status/ready_test.go

to validate the new granular readiness response format.


Formatting

Applied:

go fmt

to all modified files.


Why this matters

These changes improve:

  • Mesh observability
  • Readiness diagnostics
  • Control plane visibility
  • Headlamp integration capabilities

Users can now identify:

  • Missing BPF attachments
  • Map initialization issues
  • Unstable XDS streams
  • Partial readiness failures

without relying on logs or deep internal debugging.


Does this PR introduce a user-facing change?

Expanded the /debug/ready endpoint to include granular status for eBPF programs, maps, and XDS stream stability for better mesh observability.

Copilot AI review requested due to automatic review settings May 15, 2026 05:36
@kmesh-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the operational observability of the mesh by transforming the /debug/ready endpoint from a simple health check into a detailed diagnostic tool. By exposing granular state information for BPF components and XDS stream connectivity, it enables easier debugging of partial failures and provides necessary data for external monitoring systems and UI integrations like Headlamp.

Highlights

  • Granular BPF Status: Expanded BpfLoader to report detailed status for individual eBPF programs and maps, improving low-level observability.
  • XDS Stream Stability: Added thread-safe tracking for XDS connection stability, including reconnect counts and last successful connection timestamps.
  • Enhanced Readiness Endpoint: Updated the /debug/ready endpoint to return a comprehensive JSON payload covering BPF, XDS, and controller health.
  • Concurrency Improvements: Introduced RWMutex and atomic variables across controllers and XdsClient to ensure thread-safe status reporting.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


The mesh is quiet, deep and wide, With BPF and XDS inside. We check the state, we watch the flow, So readiness is all we know.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a comprehensive readiness probe by adding status tracking and IsReady methods to the BPF loader and XDS controllers. It also introduces thread safety improvements across the ADS and workload controllers using mutexes and atomic booleans. A critical compilation error was identified in pkg/controller/client.go where GetXdsStreamStability returns a single string, but its caller in the status server expects two separate values for reconnect count and connection time.

Comment thread pkg/controller/client.go Outdated
@vanshika2720 vanshika2720 force-pushed the feat/granular-debug-ready branch from f9c5a44 to 62bc8f7 Compare May 15, 2026 08:08
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 28.07882% with 146 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.28%. Comparing base (7e6ccf6) to head (62bc8f7).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
pkg/bpf/bpf.go 0.00% 62 Missing ⚠️
pkg/controller/client.go 20.75% 42 Missing ⚠️
pkg/status/status_server.go 46.66% 20 Missing and 4 partials ⚠️
pkg/controller/ads/ads_controller.go 56.52% 9 Missing and 1 partial ⚠️
pkg/controller/workload/workload_controller.go 60.00% 6 Missing and 2 partials ⚠️

❌ Your patch check has failed because the patch coverage (28.07%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines Coverage Δ
pkg/controller/workload/workload_controller.go 33.89% <60.00%> (+3.12%) ⬆️
pkg/controller/ads/ads_controller.go 64.00% <56.52%> (-4.43%) ⬇️
pkg/status/status_server.go 37.50% <46.66%> (+1.68%) ⬆️
pkg/controller/client.go 47.01% <20.75%> (-17.62%) ⬇️
pkg/bpf/bpf.go 37.90% <0.00%> (-7.59%) ⬇️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6214ec0...62bc8f7. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vanshika2720 vanshika2720 force-pushed the feat/granular-debug-ready branch from 62bc8f7 to dfa83b6 Compare May 17, 2026 06:13
Copilot AI review requested due to automatic review settings May 17, 2026 06:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…tatus

Expand the readiness probe endpoint to return structured JSON with
per-component health details for BPF programs/maps and XDS connection
state, replacing the previous plain-text OK response.

Changes:
- Add BpfStatus struct with per-program and per-map health reporting
- Add XDS connection state, controller readiness, and reconnect metadata
- Move gRPC dial outside mutex in createGrpcStreamClient to avoid
  blocking readiness readers during slow reconnects
- Add thread-safe IsReady/GetBpfStatus to BpfLoader
- Add IsReady/GetGrpcState/GetControllerStatus/GetXdsStreamStability
  to XdsClient with RWMutex protection
- Add mutex and atomic.Bool to ADS and Workload controllers for
  thread-safe stream access and initialization tracking
- Add readiness probe tests covering nil state, JSON structure,
  XDS metadata, and nested field validation

Signed-off-by: vanshika2720 <pahalvanshikaa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants