feat(agent): Add Primus Turbo optimization agent skills#338
Open
ChengYao-amd wants to merge 10 commits into
Open
feat(agent): Add Primus Turbo optimization agent skills#338ChengYao-amd wants to merge 10 commits into
ChengYao-amd wants to merge 10 commits into
Conversation
…ript, log enforcement
* add knowdge & rules, add survey step * fix gemm fp8 blockwise Llama-3.1-405B shape bug * feat(agent): delete some terminate conditions and add tips distill (#285)
* update skill, modifiy accept standard for FORWARD+BACKWARD, limit sleep most 15min in case of cli stop by accident * update performance trend format * kernel-optimize: add quick baseline step to ENVIRONMENT_BASELINE After representative_shapes are filled, run quick_command once against them and save the output to rounds/round-1/artifacts/quick_baseline.log. Later VALIDATE quick rounds can diff their own quick_validation.log against this reference when metrics look off. Baseline record template now documents the log path.
* update skill, modifiy accept standard for FORWARD+BACKWARD, limit sleep most 15min in case of cli stop by accident * update performance trend format * kernel-optimize: add quick baseline step to ENVIRONMENT_BASELINE After representative_shapes are filled, run quick_command once against them and save the output to rounds/round-1/artifacts/quick_baseline.log. Later VALIDATE quick rounds can diff their own quick_validation.log against this reference when metrics look off. Baseline record template now documents the log path. * fix triton requirements * update benchmark for consistency * Add hard rule + skill: forbid benchmark-only caches in kernel optimization Adds an always-applied hard rule and an operational skill that block agents from landing wrapper-level caches whose hit rate depends on the benchmark idiom (same `a` / `grad_out` Python object reused 100 times inside a timing loop). Such caches inflate benchmark scores but produce no gain in real LLM training, where activations and grad_out tensors are fresh tensors each iteration. New files: - `agent/rules/no_benchmark_overfitting.mdc`: alwaysApply rule defining the forbidden patterns (F1: id(a)-keyed activation cache, F2: id(grad_out)-keyed cache, F3: id(scale_of_activation)-keyed cache, F4: any non-weight id(...)-keyed cache), allowed patterns (kernel fusion, ctx.save_for_backward, weight cache with bounded gain), and the required `Real-training transfer check` round summary section. - `agent/skills/kernel-optimize/avoid-benchmark-overfit/SKILL.md`: operational checklist with a 6-step audit (bucket classification, id(...) audit, pen-and-paper hit-rate trace, weight-cache gain bound, required summary section, tips-file hygiene), worked example, and VALIDATE-time checklist. Updated entry points so agents discover these from the existing flow: - `agent/skills/kernel-optimize/SKILL.md`: knowledge reference table + pre-loop reading list now point at both files. - `agent/skills/kernel-optimize/workflow/optimize-loop.md`: iteration contract section, OPTIMIZE phase, and VALIDATE hard gates now reference the rule and the audit, with `id(activation)` / `id(grad_out)` / `id(activation_scale)` caches as a hard reject. - `agent/skills/kernel-optimize/triton/SKILL.md` and `examples.md`: start-here reading lists updated. Co-authored-by: Cursor <cursoragent@cursor.com> * Extend transfer audit to SURVEY and REPORT phases of optimize loop Previously avoid-benchmark-overfit was only consumed at OPTIMIZE / VALIDATE. Direction-search and final-report stages had no equivalent gate, so a benchmark-only direction could enter the campaign at SURVEY and still be celebrated at REPORT even after VALIDATE rejected the worst offenders. This change adds two new gates: - SURVEY: related-work-template now carries a Real-training Transfer Audit table that tags every shortlisted direction with a K1-K4 / W1 / W2 / W3 bucket, and the Initial Hypothesis Shortlist must be filtered to K1-K4 plus bounded W1 only. The kernel-optimize SKILL spells this out, and avoid-benchmark-overfit gets a Step 0 SURVEY-time direction filter and a SURVEY checklist. - REPORT: optimize-loop's REPORT phase now requires a Real-training applicability audit table that re-attributes baseline -> final best delta into structural / bounded / benchmark-only components. Final report cannot ship if any accepted round still has decision REJECT-as-overfit, or if the inflation gap (headline minus real-training equivalent) exceeds 1%. avoid-benchmark-overfit gets a Step 7 REPORT-time re-attribution procedure and a REPORT checklist. ANALYZE necessarily inherits the same buckets, so candidate directions must answer a Real-training transfer assessment question before being promoted to the round's primary hypothesis. W2 / W3 directions are no longer eligible for promotion under any aggregate-score argument. Net effect: an agent following this loop can no longer accidentally spend a campaign chasing a +X% number that disappears in real LLM training; the same bucket tag follows a direction from survey to final report. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.