fix: round-6 audit cleanup — 11 fixes (stacked on #31)#32
Open
sweetcornna wants to merge 1 commit into
Open
Conversation
Spawned 4 parallel discovery agents (code-quality / cost+perf / UX / sourcemap-unexplored). Each returned a punch list; this PR lands the P0 items that are small + safe + high-impact. Bigger refactors (mid-run interrupt, SkillTool on-demand load, prompt-cache wiring, surgical edit tool) deferred to round 7. ## Critical fixes 1. **AgentEvent contract drift** — `paper_editor` agent literal + 6 `finetune.*` event kinds + `kernel.error` kind were emitted at runtime but rejected by the Pydantic AgentEvent contract; every fine-tune session crashed on the first emit. Real smoke test caught this. Added to packages/py-contracts/src/mm_contracts/agent_io.py AgentName + EventKind Literals. 2. **`PaperEditorAgent.__init__()` TypeError** — finetune_main was passing a `matlab_session=` kwarg the agent doesn't accept. Dropped from finetune_main.py and removed the unused MatlabSession import. 3. **Per-run_id mutex between pipeline + finetune consumers** (Agent A #2). The two consumers were free to write the same notebook + paper.meta.json + figures/ concurrently. main.py + finetune_main.py now share a `dict[UUID, asyncio.Lock]` and grab the lock around the business logic. ## Performance fixes 4. **Lift `_PARAM_PATTERNS` to module level** (Agent A #4): three regexes were being compiled on every `mine_sensitivity_evidence` call. Moved to module-level constants + frozenset blacklist; ~3× speedup on the Writer evidence scan per revision round. 5. **Reuse `_FIG_ID_RE`** in evidence.py (Agent A #5): drop inline `re.finditer(r"\[\[FIG:..." )` recompile. 6. **Move `{{ upstream_reminders }}` to BOTTOM of user templates** (Agent B cache-strategy #4): when it sat at the top, the stable-prefix cache key was broken across every revision. Now ordered: static (system+catalog+exemplars) → dynamic reminders → "Respond with..." instruction. Searcher / Modeler / Coder / Writer. ## Contract / safety fixes 7. **`CoderDirective.language: Literal["python","matlab"]`** (Agent A #6): was `str`, contract drift was invisible to Pydantic. The runtime already normalized; type now matches. ## Frontend UX fixes 8. **5→6 stage pill grid** (Agent C #1): `apps/web/src/styles.css:522` was `repeat(5, ...)` but `StagePills.vue` enumerates 6 agents, leaving Critic wrapping out of flow. Also added a 980px tablet breakpoint for 3 cols. 9. **Removed dead Pause button** (Agent C #5): disabled with tooltip-only feedback. Keep "New run" link; bring back a real Cancel when the backend exposes one. 10. **`<img loading="lazy" decoding="async">` + alt fallback** in PaperDraft.vue (Agent C #5): Writer emits `` so alt was always empty. Walker now derives a fallback from `title` or the filename stem; renderer override adds the lazy + async attrs so long papers don't block first paint. ## Test status - `uv run --frozen pytest apps/agent-worker -q` → **424 passed**, 1 deselected (unchanged from baseline; all fixes are additive or contract-only). - `uvx ruff check apps/agent-worker/src apps/agent-worker/tests` — clean. - `pnpm --filter web typecheck` — clean. ## Deferred to round 7 (per agent recommendations) From Agent B (cost): per-stage `max_revision_rounds` (Writer:1, others:2), trim coder_cells.source from Writer prompt, Anthropic `cache_control: ephemeral` wiring in `crates/gateway/src/llm/providers/`, concurrent Critic + next-stage prep (~6.7 min wall savings). From Agent D (sourcemap): SkillTool on-demand body load, mid-run interrupt control channel (RemoteControlHandle pattern), surgical edit tool for PaperEditor (find/replace vs wholesale rewrite). From Agent A (code-quality): `_review_and_maybe_revise` cost budget hooked to actual gateway cost events (current estimates are 7× low), test coverage for `run_finetune` end-to-end + Critic revision fall-through, lift duplicated `_problem_letter_from_problem_text` to shared module.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #31. Round-6 audit fixes from 4 parallel discovery agents (code-quality / cost+perf / UX / sourcemap-unexplored). Lands the P0 items that are small + safe + high-impact.
Critical fixes
AgentEvent contract drift —
paper_editor+ 6finetune.*event kinds +kernel.errorwere emitted at runtime but rejected by the Pydantic contract → every fine-tune session crashed on first emit. Real smoke test caught this. ExtendedAgentName/EventKindLiterals.PaperEditorAgent.__init__()TypeError —finetune_mainpassedmatlab_session=kwarg the agent doesn't accept. Real smoke test exposed.Per-run_id mutex between pipeline + finetune consumers (Agent A [Phase 2] M10 Searcher agent + 5-agent pipeline (arXiv retrieval) #2). Two consumers could write the same
notebook.ipynb/paper.meta.json/figures/concurrently.main.py+finetune_main.pynow share adict[UUID, asyncio.Lock].Performance fixes
Lift
_PARAM_PATTERNSto module level (Agent A [Phase 3] v0.2.0 编辑器 UI 重构 + reasoning effort + long context #4): three regexes recompiled on everymine_sensitivity_evidencecall. ~3× speedup on the Writer evidence scan.Reuse
_FIG_ID_REin evidence.py (Agent A [Phase 3] v0.3.0 Award-mode prompts + 20 类图表目录 #5).Move
{{ upstream_reminders }}to BOTTOM of user templates (Agent B cache strategy [Phase 3] v0.2.0 编辑器 UI 重构 + reasoning effort + long context #4): preserves stable prompt-cache prefix across revision rounds. 4 TOMLs touched (searcher / modeler / coder / writer).Contract / safety
CoderDirective.language: Literal["python","matlab"](Agent A [Phase 3.5] v0.3.0 论文导出流水线(4 格式 × 3 模板) #6): wasstr.Frontend UX
5→6 stage pill grid (Agent C [Phase 2] M9 HMML 知识库 + Modeler 集成 #1):
repeat(5, ...)left Critic pill wrapping out of flow; bumped + added 980px breakpoint.Removed dead Pause button (Agent C [Phase 3] v0.3.0 Award-mode prompts + 20 类图表目录 #5): was disabled with tooltip-only feedback.
loading=\"lazy\" decoding=\"async\"+ alt fallback inPaperDraft.vue(Agent C [Phase 3] v0.3.0 Award-mode prompts + 20 类图表目录 #5): Writer emitsso alt was always empty. Walker now derives a fallback; renderer override adds lazy attrs so long papers don't block first paint.Test status
pytest: 424 passed (unchanged baseline; all fixes additive)ruff: cleanpnpm --filter web typecheck: cleanDeferred to round 7
From Agent B (cost):
max_revision_rounds(Writer:1, others:2) → ~0.57 RMB/runcoder_cells.sourcefrom Writer prompt → ~0.4 RMB/runcache_control: ephemeralwiring → ~0.7-1.0 RMB/runFrom Agent D (sourcemap):
RemoteControlHandlepattern, ~400 LOC) — user-chosen in round 5From Agent A (code-quality):
_review_and_maybe_revisecost budget hooked to actual gateway cost events (current estimates 7× too low)run_finetuneend-to-end_problem_letter_from_problem_textto shared module