Skip to content

fix(evolution): pipeline bugs - JUDGE_API_KEY, dedup, routing, procedures, backup, observability#46

Open
electronicBlacksmith wants to merge 2 commits intoghostwright:mainfrom
electronicBlacksmith:upstream/fix/evolution-pipeline-bugs
Open

fix(evolution): pipeline bugs - JUDGE_API_KEY, dedup, routing, procedures, backup, observability#46
electronicBlacksmith wants to merge 2 commits intoghostwright:mainfrom
electronicBlacksmith:upstream/fix/evolution-pipeline-bugs

Conversation

@electronicBlacksmith
Copy link
Copy Markdown

Summary

Fixes 8 issues discovered during a config audit of the evolution pipeline:

High-priority (commit 1):

  • JUDGE_API_KEY: Dedicated API key for judge client with cost isolation, preferred over ANTHROPIC_API_KEY/OAuth tokens
  • Dedup appends: applyDelta() filters duplicate lines on append instead of blindly appending. Returns null for no-ops so version is not bumped spuriously
  • Route all observations: buildCritiqueFromObservations() now routes domain_fact, error, tool_pattern, and success observations to their config files instead of silently dropping them
  • Stuck onboarding: markOnboardingComplete() called after evolution version >= 2

Medium-priority (commit 2):

  • Procedural memory bug: consolidateSessionWithLLM() counted detected procedures but never called storeProcedure() - the full storage layer existed but was never invoked
  • Config backup: backupConfig() copies phantom-config/ to data/config-backups/vN after successful evolution, retaining last 5 versions
  • Model mismatch visibility: Log warning when env overrides yaml model, surface model + source in /health endpoint and phantom status
  • Evolution observability: Added session_count, sessions_since_consolidation, session_log_depth to health endpoint. Added Evolution Pipeline doctor check

Test plan

  • 7 new tests for medium-priority fixes
  • All existing tests pass
  • bun run typecheck clean
  • bun run lint clean

… onboarding

Fix 1: Use JUDGE_API_KEY for dedicated judge client cost isolation.

Fix 2: applyDelta() now filters duplicate lines on append instead of
blindly appending. Returns null for no-op appends so version is not
bumped. compressUserProfile() added to consolidation with size gate.

Fix 3: buildCritiqueFromObservations() now routes domain_fact, error,
tool_pattern, and success observations to their respective config files
instead of silently dropping them. affected_files from judge output is
preserved through the pipeline with path traversal validation.

Fix 4: markOnboardingComplete() called after evolution version >= 2,
clearing the onboarding prompt that was re-injecting every restart.
- Fix procedural memory bug: consolidateSessionWithLLM() counted
  detected procedures but never stored them. Add storage loop after
  facts extraction.
- Add post-evolution config backup to data/config-backups/vN with
  5-version retention.
- Surface model mismatch: warn when env overrides yaml model, add
  model + model_source to /health endpoint and phantom status.
- Add evolution observability: session_count, sessions_since_consolidation,
  session_log_depth in health/status. Add Evolution Pipeline doctor check.
- Fix default model to claude-opus-4-6 across config, schema, and init.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant