feat(harness): lb scenario tier — simulate tier/failover/breaker/affinity#1224
Open
0x0079 wants to merge 1 commit into
Open
feat(harness): lb scenario tier — simulate tier/failover/breaker/affinity#12240x0079 wants to merge 1 commit into
0x0079 wants to merge 1 commit into
Conversation
6e72508 to
be55eb7
Compare
be55eb7 to
245767b
Compare
…nity Follow-up to the affinity/tier fix. Adds a load-balancing scenario simulator and a `harness lb` CLI tier on top of it, so routing dynamics (tier selection, mid-request failover, breaker trip + timed recovery, health-based exclusion, affinity pin movement) can be driven against programmable fake upstreams over a request sequence and watched request-by-request. - internal/server/lbsim.go: shared engine (LBSimulator) driving the real ServiceSelector.Select -> dispatchWithPriorityFailover path with a deterministic breaker clock; feeds both production feedback channels (breaker recorder + Server.reportHealthStatus) per status, faithfully. - loadbalance: SetClock/nowFn clock seam (breaker.go + health_monitor.go, same package) so one sim clock advance recovers both channels; production unchanged (defaults to time.Now). - cli/harness lb: YAML/`--example` scenarios; default pencil-graph output (per-request hops + svc breaker/health/pin), `--table`, `--json`. - .design/priority-routing.pencil.md: 500-retry worked-example graph; README + the two-feedback-channels note.
245767b to
6ff2ba1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #1223, now rebased onto
main(the core affinity/tier fix merged in #1223). The diff is harness-only.What
Our tests could not realistically verify load-balancing dynamics. This adds a scenario simulator and a
harness lbCLI tier so tier selection, mid-request failover, the circuit breaker (trip + timed recovery), health-based exclusion, and session-affinity pin movement can be driven against programmable fake upstreams over a request sequence — and watched request by request.internal/server/lbsim.go— shared engineLBSimulator, driving the realServiceSelector.Select → dispatchWithPriorityFailoverpath with a deterministic clock. Each attempt feeds both production feedback channels exactly as a real request would: the breaker recorder andServer.reportHealthStatus(status-classified: 429 → rate-limit, 401/403 → immediate auth-unhealthy, 5xx/other → 3-strike).SetClock/nowFninloadbalance(breaker.go+health_monitor.go) androuting(affinity TTL). One fake clock drives breaker recovery, health recovery, and strict affinity-TTL expiry together. Production is unchanged (defaults totime.Now).cli/harness lb—--file scenario.yamlor--example cascade|flat|grid|single|regression|ratelimit|authflip|crossmodel. Default output is a pencil graph (per-request failover hops + each svc's breaker/health + affinity pin);--tableand--jsonalso available.Alignment with the merged load-balance fixes
Rebasing surfaced two behaviours that landed alongside #1223/#1233; the simulator now models both:
LockedAt + affinity_secsand an in-window request does not slide it. The affinity time check rides the same fake clock as the breaker/health, soTestLBScenario_AffinityStrictTTLRelockasserts the strict expiry-and-re-lock deterministically (addedrouting.SetClock/routing.Now; the store'sGet/GCandpostProcessnow use it).TestLBScenario_CrossModelFailover+--example crossmodelshowt0/model-a → t1/model-bin the attempt trace.TestFailoverLoggingto the renamed failover log stages (failover_retry/failover_exhausted) introduced by bugfix(failover, tier): missing model info while failover #1233 — this test was already red onmain.Example
Tests
internal/server/lb_scenario_test.go— A/B/C/D shapes + the original regression + rate-limit/auth-error + cross-model + strict-TTL scenarios, all asserting the captured trace, affinity pin, and breaker/health snapshots.go build ./...,go vet, and theloadbalance/typ/routing/server/harnesspackages are green.Known gap (deferred)
G1 — horizontal tactics (random/token/…) are breaker-blind at the selection layer; documented in
priority-routing.mdand marked as an executablet.Skipin the harness.🤖 Generated with Claude Code
https://claude.ai/code/session_01MCtGUNwURzSk34PQ8gkjZC