feat(agent): Add Primus Turbo optimization agent skills by ChengYao-amd · Pull Request #338 · AMD-AGI/Primus-Turbo

ChengYao-amd · 2026-05-12T15:49:21Z

No description provided.

…ript, log enforcement

* add knowdge & rules, add survey step * fix gemm fp8 blockwise Llama-3.1-405B shape bug * feat(agent): delete some terminate conditions and add tips distill (#285)

* update skill, modifiy accept standard for FORWARD+BACKWARD, limit sleep most 15min in case of cli stop by accident * update performance trend format * kernel-optimize: add quick baseline step to ENVIRONMENT_BASELINE After representative_shapes are filled, run quick_command once against them and save the output to rounds/round-1/artifacts/quick_baseline.log. Later VALIDATE quick rounds can diff their own quick_validation.log against this reference when metrics look off. Baseline record template now documents the log path.

* update skill, modifiy accept standard for FORWARD+BACKWARD, limit sleep most 15min in case of cli stop by accident * update performance trend format * kernel-optimize: add quick baseline step to ENVIRONMENT_BASELINE After representative_shapes are filled, run quick_command once against them and save the output to rounds/round-1/artifacts/quick_baseline.log. Later VALIDATE quick rounds can diff their own quick_validation.log against this reference when metrics look off. Baseline record template now documents the log path. * fix triton requirements * update benchmark for consistency * Add hard rule + skill: forbid benchmark-only caches in kernel optimization Adds an always-applied hard rule and an operational skill that block agents from landing wrapper-level caches whose hit rate depends on the benchmark idiom (same `a` / `grad_out` Python object reused 100 times inside a timing loop). Such caches inflate benchmark scores but produce no gain in real LLM training, where activations and grad_out tensors are fresh tensors each iteration. New files: - `agent/rules/no_benchmark_overfitting.mdc`: alwaysApply rule defining the forbidden patterns (F1: id(a)-keyed activation cache, F2: id(grad_out)-keyed cache, F3: id(scale_of_activation)-keyed cache, F4: any non-weight id(...)-keyed cache), allowed patterns (kernel fusion, ctx.save_for_backward, weight cache with bounded gain), and the required `Real-training transfer check` round summary section. - `agent/skills/kernel-optimize/avoid-benchmark-overfit/SKILL.md`: operational checklist with a 6-step audit (bucket classification, id(...) audit, pen-and-paper hit-rate trace, weight-cache gain bound, required summary section, tips-file hygiene), worked example, and VALIDATE-time checklist. Updated entry points so agents discover these from the existing flow: - `agent/skills/kernel-optimize/SKILL.md`: knowledge reference table + pre-loop reading list now point at both files. - `agent/skills/kernel-optimize/workflow/optimize-loop.md`: iteration contract section, OPTIMIZE phase, and VALIDATE hard gates now reference the rule and the audit, with `id(activation)` / `id(grad_out)` / `id(activation_scale)` caches as a hard reject. - `agent/skills/kernel-optimize/triton/SKILL.md` and `examples.md`: start-here reading lists updated. Co-authored-by: Cursor <cursoragent@cursor.com> * Extend transfer audit to SURVEY and REPORT phases of optimize loop Previously avoid-benchmark-overfit was only consumed at OPTIMIZE / VALIDATE. Direction-search and final-report stages had no equivalent gate, so a benchmark-only direction could enter the campaign at SURVEY and still be celebrated at REPORT even after VALIDATE rejected the worst offenders. This change adds two new gates: - SURVEY: related-work-template now carries a Real-training Transfer Audit table that tags every shortlisted direction with a K1-K4 / W1 / W2 / W3 bucket, and the Initial Hypothesis Shortlist must be filtered to K1-K4 plus bounded W1 only. The kernel-optimize SKILL spells this out, and avoid-benchmark-overfit gets a Step 0 SURVEY-time direction filter and a SURVEY checklist. - REPORT: optimize-loop's REPORT phase now requires a Real-training applicability audit table that re-attributes baseline -> final best delta into structural / bounded / benchmark-only components. Final report cannot ship if any accepted round still has decision REJECT-as-overfit, or if the inflation gap (headline minus real-training equivalent) exceeds 1%. avoid-benchmark-overfit gets a Step 7 REPORT-time re-attribution procedure and a REPORT checklist. ANALYZE necessarily inherits the same buckets, so candidate directions must answer a Real-training transfer assessment question before being promoted to the round's primary hypothesis. W2 / W3 directions are no longer eligible for promotion under any aggregate-score argument. Net effect: an agent following this loop can no longer accidentally spend a campaign chasing a +X% number that disappears in real LLM training; the same bucket tag follows a direction from survey to final report. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

ChengYao-amd requested review from RuibinCheung, wenxie-amd and xiaobochen-amd as code owners May 12, 2026 15:49

xiaobochen-amd and others added 10 commits May 21, 2026 09:44

init agent

a2c05e7

init kernel-optimize/SKILL.md

666bc45

update skill

dc76ce9

fix bug

a1115f0

fix agent dont check primus turbo env bug

6701bac

update optimization skills: add termination gate, quick validation sc…

0763364

…ript, log enforcement

Update turbo agent skills and rules (#290)

368d6f1

* add knowdge & rules, add survey step * fix gemm fp8 blockwise Llama-3.1-405B shape bug * feat(agent): delete some terminate conditions and add tips distill (#285)

simplify agent code, remove some duplicated logic

94c3007

ChengYao-amd force-pushed the dev/agent branch from 2c0db83 to 94c3007 Compare May 21, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): Add Primus Turbo optimization agent skills#338

feat(agent): Add Primus Turbo optimization agent skills#338
ChengYao-amd wants to merge 10 commits into
mainfrom
dev/agent

ChengYao-amd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChengYao-amd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants