fix(workload): prevent endpoint/service BPF map divergence on partial… by OmAmbole009 · Pull Request #1680 · kmesh-net/kmesh

OmAmbole009 · 2026-05-06T21:27:49Z

Summary

This PR fixes a transactional consistency issue during endpoint registration/removal where partial failures could leave EndpointMap, ServiceMap, and EndpointCache in divergent states.

The fix ensures:

endpoint visibility is only exposed to the datapath after successful service count updates
failed updates correctly roll back intermediate state
cache state remains consistent with BPF-visible state
endpoint deletion ordering avoids transient duplicate backend visibility

Fixes #1679

Root Cause

addWorkloadToService() mutated state before all writes succeeded

Previously:

sv.EndpointCount was incremented before EndpointUpdate
EndpointCache was updated before ServiceUpdate

This created two failure windows:

Failure window A

If EndpointUpdate() failed:

sv.EndpointCount was already incremented
in-memory state diverged from the BPF service map
Failure window B

If ServiceUpdate() failed:

endpoint entry already existed in the BPF endpoint map
service count was not incremented
endpoint became unreachable to the datapath
cache state incorrectly reflected a successful add

This could lead to silent traffic inconsistencies.

deleteEndpoint() exposed transient duplicate backend visibility

The original ordering was:

Swap -> decrement count -> delete last entry

Between swap and count decrement, the swapped backend could temporarily become visible at two indices simultaneously.

Fix Strategy

addWorkloadToService()

New ordering:

EndpointUpdate -> increment count + ServiceUpdate -> cache update

Behavior:

backend index computed without mutating state
endpoint inserted first while still invisible to datapath
service count updated only after successful endpoint insertion
rollback performed if ServiceUpdate() fails
cache updated only after all BPF updates succeed
deleteEndpoint()

New ordering:

decrement count -> ServiceUpdate -> swap -> delete last entry

Behavior:

last slot becomes unreachable before swap
prevents transient duplicate backend selection
rollback attempted if swap fails
deleting the unreachable last entry becomes safe/non-critical

Tests Added

addWorkloadToService
TestAddWorkloadToService_EndpointUpdateFailure
TestAddWorkloadToService_ServiceUpdateFailure
TestAddWorkloadToService_EndpointCountConsistency
TestAddWorkloadToService_RetryAfterFailureSucceeds
deleteEndpoint
TestDeleteEndpoint_CountDecrementedBeforeSwap
TestDeleteEndpoint_DeleteLastEntry
TestDeleteEndpoint_EndpointCountServiceMapConsistency

These tests validate:

rollback correctness
endpoint/service consistency
cache consistency
retry behavior
endpoint count invariants
map compactness after deletion

Result

This change improves consistency guarantees between:

userspace workload state
EndpointMap
ServiceMap
datapath-visible topology

and prevents silent routing inconsistencies during partial BPF update failures.

kmesh-bot · 2026-05-06T21:27:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/controller/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-05-06T21:28:00Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical transactional consistency issues in the workload controller's BPF map management. By reordering operations and introducing robust rollback logic, it ensures that the userspace state, BPF maps, and the eBPF datapath remain synchronized even when partial failures occur during endpoint registration or deletion. These changes significantly improve the reliability of service routing and prevent silent traffic inconsistencies.

Highlights

Transactional Consistency: Refactored addWorkloadToService and deleteEndpoint to ensure atomic-like updates to BPF maps, preventing state divergence between EndpointMap, ServiceMap, and EndpointCache during partial failures.
Improved Error Handling: Implemented rollback mechanisms for failed BPF updates, ensuring that if a service update fails, the endpoint insertion is reverted and in-memory state remains consistent.
Datapath Visibility: Adjusted the ordering of operations to ensure endpoints are only visible to the eBPF datapath after successful registration, and unreachable before removal, preventing transient traffic inconsistencies.
Comprehensive Testing: Added a suite of unit tests to validate consistency invariants, rollback behavior, and retry logic under simulated failure conditions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

In maps where packets find their way, We keep the state from going astray. With order strict and rollback smart, Consistency is now the heart.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

kmesh-bot · 2026-05-06T21:28:03Z

Welcome @OmAmbole009! It looks like this is your first PR to kmesh-net/kmesh 🎉

gemini-code-assist

Code Review

This pull request refactors the BPF map update logic in addWorkloadToService and deleteEndpoint to provide stronger consistency guarantees and ordering between endpoint and service map writes. It introduces a swap-then-shrink algorithm for deletions and comprehensive unit tests for failure and rollback scenarios. Feedback highlights critical safety issues, including missing validation for the priority parameter and potential uint32 underflow when decrementing endpoint counts. Additionally, the reviewer noted that the EndpointCache must be updated during endpoint swaps to maintain consistency with the BPF map state.

Copilot

Pull request overview

This PR addresses transactional consistency between userspace state and the eBPF EndpointMap/ServiceMap during workload endpoint add/remove operations, preventing partial failures from leaving endpoint/service counts and caches diverged (Fixes #1679).

Changes:

Reorders addWorkloadToService() updates so the endpoint entry is written before incrementing/persisting the service EndpointCount, with rollback on partial failures and cache updates only after BPF writes succeed.
Reorders deleteEndpoint() to decrement/persist service EndpointCount before swapping/deleting endpoint entries to avoid transient duplicate backend visibility.
Adds targeted unit tests covering failure windows, rollback behavior, and endpoint/service/cache consistency invariants.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`pkg/controller/workload/workload_processor.go`	Makes add/remove endpoint operations more transactional via ordering changes and rollback behavior.
`pkg/controller/workload/workload_processor_test.go`	Adds tests validating rollback, consistency, and deletion compactness/invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

OmAmbole009 · 2026-05-06T22:16:03Z

@hzxuzhonghu @kwb0523

I’ve addressed the latest review feedback and pushed the follow-up fixes/tests. The PR should now be up to date with all discussed changes.

Could you please take a look when you get a chance? Thanks!

codecov · 2026-05-06T22:37:47Z

Codecov Report

❌ Patch coverage is 42.42424% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.72%. Comparing base (c9b38d3) to head (4ed998c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/controller/workload/workload_processor.go	42.42%	13 Missing and 6 partials ⚠️

❌ Your patch check has failed because the patch coverage (42.42%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines	Coverage Δ
pkg/controller/workload/workload_processor.go	`60.29% <42.42%> (+2.53%)`	⬆️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d77877...4ed998c. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

+	// Step 1: write the endpoint entry.  The service EndpointCount still points to the
+	// previous last entry, so the new entry is invisible to the eBPF program until step 2.
 	if err := p.bpf.EndpointUpdate(&ek, &ev); err != nil {
 		log.Errorf("Update endpoint map failed, err:%s", err)


+		BackendIndex: lastIndex,
 	}
 	if err := p.bpf.EndpointDelete(lastKey); err != nil {
-		log.Errorf("EndpointDelete [%#v] failed: %v", lastKey, err)
-		return err
+		log.Errorf("EndpointDelete [%#v] failed (ghost entry, non-fatal): %v", lastKey, err)
 	}


+	// Step 2: overwrite the deleted slot with the (now unreachable) last entry so
+	// the remaining entries stay contiguous.
+	lastIndex := newCount + 1 // index of the entry that is now unreachable
+	if err := p.bpf.EndpointSwap(ek.BackendIndex, ev.BackendUid, lastIndex, sk.ServiceId, ek.Prio); err != nil {
+		log.Errorf("swap workload endpoint index failed: %s", err)
+		// Attempt to restore the count so the service map is not left with a count
+		// that is one less than the actual number of reachable entries.
+		sv.EndpointCount[ek.Prio] = lastIndex
+		if restoreErr := p.bpf.ServiceUpdate(&sk, &sv); restoreErr != nil {
+			log.Errorf("restore EndpointCount after swap failure failed: %v", restoreErr)
+			return fmt.Errorf("swap workload endpoint index failed: %w; restoring EndpointCount also failed: %v", err, restoreErr)
+		}
+		return err
+	}
+
+	// Step 3: delete the now-unreachable last entry.  A failure here leaves a ghost
+	// entry that is invisible to the eBPF program because EndpointCount was already
+	// decremented in step 1.  The entry wastes one map slot but does not affect


+	// Close the endpoint map.  deleteEndpoint will:
+	//   step 1: decrement count to 0, ServiceUpdate (service map open) — succeeds.
+	//   step 2: EndpointSwap(1, wlID, 1, svcID, 0) — currentIndex==lastIndex==1,
+	//           lastIndex<=1 branch → no-op, returns nil (no endpoint map access).
+	//   step 3: EndpointDelete(lastKey) — endpoint map closed → fails → logs error,
+	//           returns nil (ghost entry is non-fatal).
+	workloadMap.KmEndpoint.Close()
+
+	err := p.deleteEndpoint(ek, ev, sv, sk)
+	assert.NoError(t, err,
+		"deleteEndpoint must return nil even when EndpointDelete fails (ghost entry is non-fatal)")


+// TestAddWorkloadToService_ServiceUpdateFailure_RollbackDeleteAlsoFails documents
+// an unreachable branch in the current fake map infrastructure.
+//
+// To trigger this path, addWorkloadToService must observe:
+//  1. EndpointUpdate succeeds (endpoint map open)
+//  2. ServiceUpdate fails (service map closed)
+//  3. rollback EndpointDelete fails (endpoint map closed)
+//
+// With the current synchronous fake maps, we cannot transition endpoint-map state
+// between (1) and (3) inside a single addWorkloadToService call without changing
+// production code or fake map internals, so this path is intentionally skipped.
+func TestAddWorkloadToService_ServiceUpdateFailure_RollbackDeleteAlsoFails(t *testing.T) {
+	t.Skip("requires mid-call endpoint map state transition not possible with current fake map implementation")
+}


… update failure addWorkloadToService previously incremented sv.EndpointCount before the endpoint BPF map write succeeded, and updated EndpointCache before the service BPF map write succeeded. This created two failure windows: 1. If EndpointUpdate failed, sv.EndpointCount was already incremented in the caller's copy, leaving the in-memory service value out of sync with the BPF service map on the next ServiceLookup. 2. If ServiceUpdate failed after a successful EndpointUpdate, the endpoint entry existed in the BPF map but EndpointCount was not incremented, so the eBPF program could never select the new endpoint (silent traffic loss). EndpointCache was also updated despite the failure, diverging from the actual datapath state. deleteEndpoint previously decremented EndpointCount after the EndpointSwap, meaning the eBPF program could briefly observe the swapped-in backend at two indices simultaneously (duplicate traffic window). Fix addWorkloadToService: - Compute the new BackendIndex without mutating sv.EndpointCount. - Write the endpoint entry first (invisible to eBPF until count grows). - Increment EndpointCount and call ServiceUpdate second. - On ServiceUpdate failure, delete the just-inserted endpoint entry and restore sv.EndpointCount so the caller's copy is not left dirty. - Update EndpointCache only after both BPF writes succeed. Fix deleteEndpoint: - Decrement EndpointCount and call ServiceUpdate first, making the last slot unreachable to the eBPF program before any data is moved. - Perform EndpointSwap second (both slots hold the same backend but the last slot is already unreachable, so no duplicate traffic). - On EndpointSwap failure, attempt to restore EndpointCount. - Delete the now-unreachable last entry last; a failure here leaves a ghost entry that is invisible to the datapath and will be cleaned up on the next restart via handleRemovedAddressesDuringRestart. Add six unit tests covering: - EndpointUpdate failure: sv not mutated, cache not updated. - ServiceUpdate failure: endpoint rolled back, cache not updated. - EndpointCount always equals actual BPF endpoint entries after adds. - deleteEndpoint: count decremented, map compact, deleted entry absent. - deleteEndpoint: last-entry deletion leaves count 0 and map empty. - Mixed add/delete sequence: BPF count, map count, cache count consistent. - Retry after removal succeeds with correct state. Signed-off-by: OmAmbole009 <omambole09@gmail.com>

…nsistency Four targeted fixes based on Copilot review: 1. Add underflow/state guard in deleteEndpoint() Decrementing sv.EndpointCount[ek.Prio] when the count is already 0, or when BackendIndex is 0 or exceeds the current count, would write an invalid count to the BPF service map and corrupt the datapath state. Add explicit checks before the decrement and return a descriptive error so the caller can log and skip the entry rather than persisting bad state. 2. Fix misleading ghost-entry cleanup comment in deleteEndpoint() The previous comment claimed the ghost entry 'will be cleaned up on the next restart via handleRemovedAddressesDuringRestart'. That function cleans up backend/service entries whose UIDs are absent from the userspace cache; it does not explicitly iterate unreachable endpoint slots. Replace the comment with an accurate description: the entry is invisible to the eBPF program because EndpointCount was already decremented, and it will remain until the next explicit cleanup of the owning service or workload. 3. Fix inaccurate test comment in TestAddWorkloadToService_EndpointUpdateFailure The old comment described the failure injection as 'using a service that does not exist in the BPF map'. The actual mechanism is closing workloadMap.KmEndpoint so that the EndpointUpdate call inside addWorkloadToService returns an error. Update the comment to accurately describe what is being tested. 4. Fix outdated function-level rollback comment in addWorkloadToService() The previous doc comment said the rollback restores 'the in-memory EndpointCache'. EndpointCache is now updated only after both BPF writes succeed, so no cache rollback is ever needed. Rewrite the comment to reflect the actual behaviour: on ServiceUpdate failure the endpoint BPF entry is deleted and sv.EndpointCount is restored; the cache is untouched because it was never written. Signed-off-by: OmAmbole009 <omambole09@gmail.com>

… and deleteEndpoint sv.EndpointCount is a fixed-size array of length bpf.PrioCount (7). Passing a priority value >= PrioCount causes an out-of-bounds array access and a runtime panic before any BPF map write occurs. addWorkloadToService: add an explicit bounds check on the priority parameter before the first sv.EndpointCount[priority] read. deleteEndpoint: add an explicit bounds check on ek.Prio before the existing underflow and BackendIndex guards, which themselves access sv.EndpointCount[ek.Prio]. The ordering is: 1. ek.Prio < bpf.PrioCount (new, prevents out-of-bounds) 2. EndpointCount[ek.Prio] > 0 (existing, prevents underflow) 3. BackendIndex in [1, EndpointCount[ek.Prio]] (existing, prevents out-of-range swap) Both checks return a descriptive error so callers can log and skip the entry without writing invalid state to the BPF service map. No logic changes beyond the new guard conditions. Signed-off-by: OmAmbole009 <omambole09@gmail.com>

…fix test comment Three targeted fixes based on Copilot review: 1. Rename 'swap-then-shrink' to 'shrink-then-swap' in deleteEndpoint doc The implemented order is: decrement EndpointCount + ServiceUpdate (shrink) first, then EndpointSwap (swap) second. The previous algorithm name was inverted relative to the actual code. 2. Return aggregated error when EndpointSwap rollback also fails When EndpointSwap fails and the subsequent ServiceUpdate to restore EndpointCount also fails, the function previously returned only the original swap error, silently discarding the restore failure. Now both errors are surfaced in the returned error so operators can distinguish a swap-only failure from a swap + rollback failure. The existing log line and rollback attempt are preserved unchanged. 3. Rename test and fix misleading comment TestAddWorkloadToService_RetryAfterFailureSucceeds used a different workload name and IP for 'wl2', making it a distinct workload rather than a retry of 'wl1'. The test name and comment implied retry semantics that were not present. Rename to TestAddWorkloadToService_NewWorkloadAfterDeletionSucceeds and update the comment to accurately describe what is tested: adding a new workload after a previous one was deleted leaves the service in a consistent state. No logic changes to the transactional ordering or datapath guarantees. Signed-off-by: OmAmbole009 <omambole09@gmail.com>

Signed-off-by: OmAmbole009 <omambole09@gmail.com>

Copilot AI review requested due to automatic review settings May 6, 2026 21:27

kmesh-bot requested review from hzxuzhonghu and kwb0523 May 6, 2026 21:27

kmesh-bot added the size/L label May 6, 2026

Copilot started reviewing on behalf of OmAmbole009 May 6, 2026 21:28 View session

OmAmbole009 force-pushed the fix/endpoint-service-bpf-map-consistency branch from 6acdbb7 to 689bf6d Compare May 6, 2026 21:30

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Comment thread pkg/controller/workload/workload_processor.go

Comment thread pkg/controller/workload/workload_processor.go

Comment thread pkg/controller/workload/workload_processor.go

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread pkg/controller/workload/workload_processor.go

Comment thread pkg/controller/workload/workload_processor.go Outdated

Comment thread pkg/controller/workload/workload_processor_test.go Outdated

Comment thread pkg/controller/workload/workload_processor.go Outdated

Copilot AI review requested due to automatic review settings May 6, 2026 21:55

OmAmbole009 force-pushed the fix/endpoint-service-bpf-map-consistency branch from 1673bd8 to 3acb2bd Compare May 6, 2026 21:55

Copilot started reviewing on behalf of OmAmbole009 May 6, 2026 21:55 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 7, 2026 16:21

kmesh-bot added size/XL do-not-merge/contains-merge-commits and removed size/L labels May 7, 2026

Copilot started reviewing on behalf of OmAmbole009 May 7, 2026 16:22 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

OmAmbole009 added 5 commits May 7, 2026 22:01

test(workload): improve coverage for endpoint failure paths

ed73d0f

Signed-off-by: OmAmbole009 <omambole09@gmail.com>

OmAmbole009 force-pushed the fix/endpoint-service-bpf-map-consistency branch from 13338c7 to ed73d0f Compare May 7, 2026 16:33

kmesh-bot removed the do-not-merge/contains-merge-commits label May 7, 2026

Conversation

OmAmbole009 commented May 6, 2026

Summary

Root Cause

Fix Strategy

Tests Added

Result

Uh oh!

kmesh-bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

kmesh-bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmAmbole009 commented May 6, 2026

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants