diff --git a/skills/argos-cli/agents/openai.yaml b/skills/argos-cli/agents/openai.yaml new file mode 100644 index 00000000..9ab1e6aa --- /dev/null +++ b/skills/argos-cli/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Argos CLI" + short_description: "Use the Argos CLI safely and correctly" + default_prompt: "Use $argos-cli to run or interpret Argos CLI commands in this repository." diff --git a/skills/argos-pr-review/SKILL.md b/skills/argos-pr-review/SKILL.md new file mode 100644 index 00000000..f72a6036 --- /dev/null +++ b/skills/argos-pr-review/SKILL.md @@ -0,0 +1,219 @@ +--- +name: argos-pr-review +description: > + Use Argos visual regression builds as one input to a complete pull request + review. Use this skill when a PR includes an Argos build URL, an Argos status + check, or a bot comment linking to an Argos build, and you need to determine + whether the visual diffs match the developer's likely intent based on the PR + context, code changes, and screenshots before approving the PR. +--- + +# Argos PR Review + +Treat the Argos build as one input to the PR review, not as a separate task and +not as the sole source of truth. + +## When To Use This Skill + +- A PR includes an Argos build link. +- A PR has an Argos check with `changes-detected`. +- An Argos bot comment says screenshots are waiting for review. + +## Review Principles + +- Start from the PR context, not from the screenshots alone. +- Infer likely developer intent from the PR title, description, linked issue, + branch name, commit messages, and code diff before judging the visual result. +- Use Argos to confirm whether the rendered outcome matches that intent. +- If the code suggests one intent and the screenshots show another result, treat + that mismatch as an important review signal. +- Do not let a clean screenshot override a code-level concern, or let a strange + screenshot override clear evidence that the change is intentional. + +## Review Workflow + +### 1. Inspect the PR context first + +Before opening snapshot diffs, inspect: + +- the PR title and description +- any linked issue or ticket context +- the branch name and commit messages when useful +- the code diff, especially the UI and test files related to the changed area + +Form a working hypothesis for the intended user-facing change before relying on +the screenshots. + +### 2. Inspect the build status + +Use the Argos CLI with the build URL or build number. + +```bash +argos build get --json +``` + +Use the build status to decide the next review step: + +| Status | Review meaning | Next step | +| ------------------ | ------------------------------------------- | ---------------------------------------------- | +| `accepted` | Diffs already approved | No visual review needed | +| `no-changes` | No visual diff against baseline | No visual review needed | +| `pending` | Build is not ready yet | Wait and check again | +| `progress` | Comparison is still running | Wait and check again | +| `changes-detected` | Screenshots need a decision | Fetch snapshots and inspect them | +| `rejected` | A prior decision already rejected the build | Do not approve without understanding why | +| `error` | Build processing failed | Do not approve until the failure is understood | +| `aborted` | Build did not complete normally | Do not approve until the reason is understood | +| `expired` | Build was not completed in time | Do not approve until a valid build exists | + +### 3. Fetch only snapshots that need review + +```bash +argos build snapshots --needs-review --json +``` + +For each diff, inspect: + +- the diff mask `url` +- the previous screenshot `base.url` +- the new screenshot `head.url` +- metadata in `head.metadata` + +### 4. Compare screenshots against intended change + +For each changed snapshot, ask: + +- does the screenshot match the UI change implied by the PR and code diff? +- is the changed area the one you would expect from the files touched? +- does the visual result reveal a regression the code review alone would not catch? +- does the screenshot contradict the claimed intent in the PR description or + linked ticket? + +### 5. Decide whether the change is intentional, broken, or flaky + +- Intentional change: the new screenshot matches the code change and the UI is stable. +- Regression: layout break, wrong state, missing content, wrong theme, clipped content, incorrect data, or an unexpected removed snapshot. +- Flaky capture: loading indicator, partially rendered content, animation frame, dynamic content drift, or the same transient state captured in multiple browsers. + +### 6. Always check for flake signals + +Pay special attention to: + +- `head.metadata.test.retry > 0` +- `head.metadata.test.retries` only as context for how many retries the test is allowed to use +- a visible loader or spinner in the screenshot +- partially loaded data +- identical scores across browsers +- identical `head.url` values across browsers + +If the page is clearly captured mid-load, report it as flaky even if the branch name sounds related to loading or data changes. + +### 7. Report the result in the PR review + +- If everything looks intentional, say the visual result appears consistent with the PR intent. +- If you find a regression, call it out with the Argos build URL, the affected snapshot names, and the mismatch between the intended change and the rendered result. +- If you find flakiness, explain the signal and recommend a stabilization fix before approval. +- When relevant, mention both the code evidence and the screenshot evidence in the same finding. + +Use this format for flaky captures: + +> `snapshot-name` looks flaky. +> Test: `test.title` in `test.location.file:test.location.line` +> Signal: loader visible / retry > 0 / identical head image across browsers / ... +> Fix: wait for the page to finish loading. If the SDK uses `waitForAriaBusy`, mark loading UI with `aria-busy="true"` so Argos waits for it to clear +> Review: `` + +## What To Inspect In Snapshot Data + +For each snapshot diff, inspect: + +- `name`: human-readable snapshot identifier +- `status`: usually `changed`, `added`, or `removed` +- `score`: rough diff magnitude +- `url`: diff mask / review overlay +- `base.url`: previous screenshot +- `head.url`: new screenshot +- `head.metadata.test`: test file, line, title, `retry`, `retries` +- `head.metadata.browser`: browser information + +## Review Heuristics + +### Expected changes + +These are usually safe when they match the PR: + +- intentional content updates +- deliberate theme or layout redesigns +- newly added screens +- removed screenshots that correspond to intentionally deleted pages or tests + +### Regressions + +Flag the review when you see: + +- missing content +- wrong empty state +- unexpected loader +- broken layout or overlap +- text clipping +- wrong route or wrong page captured +- mismatched theme or styling +- removed snapshots without an intentional test removal + +### Flaky captures + +Treat the snapshot as likely flaky when you see: + +- a spinner, skeleton, or loading state +- async content not yet rendered +- animation mid-transition +- dynamic values drifting between runs +- `retry > 0` +- identical `score` across multiple browsers +- identical `head.url` across multiple browsers + +Identical `head.url` values across browsers are a particularly strong signal that both browsers captured the same transient broken state. +`retry` is the current retry count for this run. `retries` is only the configured +retry budget, so do not treat `retries > 0` alone as evidence of flakiness. + +## Typical Fixes + +- Loading UI visible: + wait for settled content before capture; if the SDK uses `waitForAriaBusy`, add `aria-busy="true"` to the loading container so Argos waits for it to clear +- Test captures too early: + wait for content that depends on the async data before calling the screenshot helper +- Dynamic values: + freeze the data or mask the unstable element +- JS animation: + stabilize or remove the animated element from the capture + +## Suggested PR Review Wording + +Use short, actionable language. + +Flaky example: + +```text +The Argos build shows a flaky capture rather than an intentional visual change. +The screenshot is taken before async data has loaded, so the page is captured in +an intermediate state (spinner visible / empty values / incomplete content). +Please wait for settled content before taking the screenshot, or mark the +loading UI with aria-busy="true" if your SDK is configured to wait for it. +``` + +Intent mismatch example: + +```text +The code changes suggest the intent is to update X, but the Argos snapshots show +Y instead. This looks like an unintended visual regression rather than the +expected UI outcome of the PR. +``` + +## Tooling + +- Prefer the Argos CLI for build metadata and snapshot enumeration. +- Use `--json` when parsing CLI output. +- Read the PR title, description, linked context, and code diff before relying on screenshots. +- Use the PR diff and test code to confirm whether the visual change matches the code. +- Load [references/baseline.md](references/baseline.md) when the review depends on baseline selection or orphan build semantics. +- Load [references/flaky-fixes.md](references/flaky-fixes.md) when you need concrete remediation examples for flaky captures. diff --git a/skills/argos-pr-review/agents/openai.yaml b/skills/argos-pr-review/agents/openai.yaml new file mode 100644 index 00000000..10494206 --- /dev/null +++ b/skills/argos-pr-review/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Argos PR Review" + short_description: "Review Argos builds as part of PR review" + default_prompt: "Use $argos-pr-review to review this pull request when an Argos build is available." diff --git a/skills/argos-pr-review/references/baseline.md b/skills/argos-pr-review/references/baseline.md new file mode 100644 index 00000000..823295f9 --- /dev/null +++ b/skills/argos-pr-review/references/baseline.md @@ -0,0 +1,73 @@ +# Baseline And Orphan Build Notes + +Load this file when an Argos PR review depends on baseline selection, orphan +build semantics, or how approval affects future comparisons. + +--- + +## Baseline Build + +The baseline is the screenshot Argos compares the new build against. + +**Approval vs. baseline:** approving a build makes it a candidate for future +baseline selection. It does not directly overwrite the current baseline. The +baseline is selected dynamically each time a new build runs. + +Do not say: + +```text +Approving this build will update the baseline. +``` + +Say instead: + +```text +Approving this build would make it eligible to be selected as a baseline for future builds on this branch. +``` + +## Baseline Selection Rules + +Argos picks the most recent build satisfying all of: + +1. Same build name as the triggered build +2. All framework tests passed +3. Not a subset build +4. Status is auto-approved, manually approved, or orphan +5. Its commit is an ancestor of the merge base between the new build's commit + and the baseline branch + +## Baseline Branch + +- Pull requests: the PR base branch +- Push events: the project's configured default baseline branch, usually `main` + +## Baseline Overrides + +- `ARGOS_REFERENCE_BRANCH`: force a specific branch as baseline +- `ARGOS_REFERENCE_COMMIT`: pin to a specific commit's build + +## CI Caveat + +If CI does not run on the default branch, the baseline becomes stale and diffs +accumulate. Keep Argos running on the default branch too. + +## Orphan Builds + +An orphan build has no prior build to compare against, so all snapshots appear +as `added` with `base: null`. + +This is expected for: + +- the first build in a project +- a branch that has no approved ancestor build yet + +The absence of a baseline is not itself a regression signal. + +## Review Guidance + +When reviewing an orphan build: + +- inspect screenshots for obvious layout or rendering problems +- do not treat every `added` snapshot as suspicious just because `base` is null +- approve the build if the screenshots look correct and it is intended to seed a + future baseline diff --git a/skills/argos-pr-review/references/flaky-fixes.md b/skills/argos-pr-review/references/flaky-fixes.md new file mode 100644 index 00000000..94ccc373 --- /dev/null +++ b/skills/argos-pr-review/references/flaky-fixes.md @@ -0,0 +1,125 @@ +# Flaky Fix Code Examples + +Code examples for each flakiness fix. Load this file when you need to give a developer the exact implementation for a fix identified during build review. + +--- + +## 1. Loaders / async data — `aria-busy` + +For SDKs that enable `waitForAriaBusy`, `argosScreenshot()` waits until no +element with `aria-busy` is present or truthy before capturing. Apply it to +loader components in those integrations. + +```jsx +// while data is loading + + +// once data is ready, remove the attribute or set to false + +``` + +Plain HTML: + +```html +
+``` + +--- + +## 2. Dynamic dates and times + +**Option A — hide the element (preserves layout):** + +```html + +``` + +**Option B — freeze dates in test seeds:** +Run a normalization script before the E2E suite that sets all date fields to a fixed value. This is preferable when the date drives layout (e.g. affects column widths or sort order). + +--- + +## 3. Dynamic content — `data-visual-test` attributes + +| Attribute | CSS effect | When to use | +| -------------------------------- | ----------------------- | -------------------------------------------------------------------------- | +| `data-visual-test="transparent"` | `visibility: hidden` | Hide content, keep layout space | +| `data-visual-test="removed"` | `display: none` | Hide content and collapse layout space | +| `data-visual-test="blackout"` | Black mask overlay | Obscure sensitive or dynamic content (Playwright, Cypress, Storybook only) | +| `data-visual-test-no-radius` | Removes `border-radius` | Fix cross-browser rounding glitches | + +```html + + + + +
+ + + + + + +``` + +--- + +## 4. Animations and transitions + +The Argos SDK automatically stabilizes CSS animations (pauses them at a consistent frame before capture). For JS-driven animations that are still causing flakiness, hide the element: + +```html +
+``` + +Or disable the animation in test via CSS injection: + +```ts +await argosScreenshot(page, "my-page", { + argosCSS: + "*, *::before, *::after { animation: none !important; transition: none !important; }", +}); +``` + +--- + +## 5. Border-radius glitches + +Cross-browser and cross-OS rendering can produce 1px differences around rounded corners. Strip the radius for the screenshot: + +```html + +``` + +--- + +## 6. `argosScreenshot` stabilization options (Playwright) + +The stabilization behavior can be customized per screenshot: + +```ts +await argosScreenshot(page, "name", { + stabilize: { + waitForAriaBusy: true, // wait for aria-busy to clear (default: true) + waitForFonts: true, // wait for fonts to load (default: true) + waitForImages: true, // wait for images to load (default: true) + hideScrollbars: true, // hide scrollbars (default: true) + hideCarets: true, // hide text carets (default: true) + fontAntialiasing: true, // force consistent font antialiasing (default: true) + stabilizeSticky: true, // make sticky/fixed elements position:absolute (default: true) + }, +}); +``` + +To debug a specific test by running it multiple times to surface inconsistencies: + +```sh +npx playwright test --repeat-each 5 +``` + +## 7. Framework caveat + +`aria-busy` is not a universal fix across every Argos integration. For example, +some integrations may disable `waitForAriaBusy` by default, so waiting for +settled content explicitly can be a more portable recommendation than relying +only on `aria-busy`.