argos-ci · jsfez · Apr 9, 2026
diff --git a/skills/argos-cli/SKILL.md b/skills/argos-cli/SKILL.md
@@ -2,9 +2,9 @@
 name: argos-cli
 description: >
   Operate the Argos visual regression platform from the terminal — fetch build
-  metadata, review snapshot diffs, upload screenshots, and manage CI builds via
-  the `argos` CLI. Use when the user wants to run Argos commands in the shell,
-  scripts, or CI/CD pipelines, or when reviewing visual regression builds.
+  metadata, inspect snapshot diffs, upload screenshots, and manage CI builds
+  via the `argos` CLI. Use when the user wants to run Argos commands in the
+  shell, scripts, or CI/CD pipelines.
   Always load this skill before running `argos` commands — it contains the flag
   contract and output shapes that prevent silent failures.
 license: MIT
@@ -19,14 +19,8 @@ argument-hint: Requires `ARGOS_TOKEN` or an explicit `--token` when running auth
 
 ## Agent Protocol
 
-The CLI writes errors to stderr. Some commands support both human-readable text
-and JSON output.
-
-**Rules for agents:**
-
 - Supply `--token` or set `ARGOS_TOKEN`. The CLI exits with code 1 if no token is found.
-- Exit `0` = success, `1` = error.
-- All errors go to stderr: `Error: <message>`
+- Exit `0` = success, `1` = error. All errors go to stderr: `Error: <message>`
 - Use `--json` whenever stdout will be parsed by a script or agent.
 
 ## Authentication
@@ -43,19 +37,41 @@ Auth resolves: `--token` flag > `ARGOS_TOKEN` env var.
 | `finalize`              | Finalize a parallel build        |
 | `skip`                  | Mark a build as skipped          |
 
-Read the matching reference file for detailed flags and output shapes.
+## Build Status Reference
 
-## Common Patterns
+| Status             | Terminal? | Meaning                             |
+| ------------------ | --------- | ----------------------------------- |
+| `no-changes`       | ✅        | All snapshots identical to baseline |
+| `accepted`         | ✅        | All changes approved                |
+| `rejected`         | ✅        | Changes explicitly rejected         |
+| `changes-detected` | ✅        | Diffs found, awaiting decision      |
+| `error`            | ✅        | Build processing failed             |
+| `aborted`          | ✅        | Build was cancelled                 |
+| `expired`          | ✅        | Build expired before completion     |
+| `pending`          | ⏳        | Waiting for screenshots to arrive   |
+| `progress`         | ⏳        | Screenshots are being compared      |
 
-**Review a build (fetch metadata first, then diffs that need review):**
+Only `no-changes` and `accepted` are all-clear terminal states. For review
+decision-making, keep the PR-review triage logic in the dedicated Argos review
+skill rather than here.
+
+## Snapshot Inspection
+
+Use the CLI to fetch the build and the raw diffs that need attention:
 
 ```bash
-argos build get 72652
-argos build snapshots 72652 --needs-review
-argos build get 72652 --json
-argos build snapshots 72652 --needs-review --json
+argos build get <buildReference> --json
+argos build snapshots <buildReference> --needs-review --json
 ```
 
+This skill is the command-and-output reference for those operations. If you need
+the full "review an Argos build as part of a PR review" workflow, keep that in
+the dedicated public Argos review skill rather than here.
+
+---
+
+## Common Patterns
+
 **Upload screenshots in CI:**
 
 ```bash
@@ -69,6 +85,8 @@ argos upload ./screenshots --parallel-nonce $CI_PIPELINE_ID --parallel-index $CI
 argos finalize --parallel-nonce $CI_PIPELINE_ID
 ```
 
+---
+
 ## When to Load References
 
 - **Fetching build data or reviewing snapshots** → [references/build.md](references/build.md)

diff --git a/skills/argos-cli/agents/openai.yaml b/skills/argos-cli/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Argos CLI"
+  short_description: "Use the Argos CLI safely and correctly"
+  default_prompt: "Use $argos-cli to run or interpret Argos CLI commands in this repository."
diff --git a/skills/argos-pr-review/SKILL.md b/skills/argos-pr-review/SKILL.md
@@ -0,0 +1,173 @@
+---
+name: argos-pr-review
+description: >
+  Review Argos visual regression builds as part of a pull request review. Use
+  this skill when a PR includes an Argos build URL, an Argos status check, or a
+  bot comment linking to an Argos build, and you need to decide whether the
+  visual diffs are intentional changes, regressions, or flaky captures before
+  approving the PR.
+---
+
+# Argos PR Review
+
+Treat the Argos build as part of the PR review, not as a separate task.
+
+## When To Use This Skill
+
+- A PR includes an Argos build link.
+- A PR has an Argos check with `changes-detected`.
+- An Argos bot comment says screenshots are waiting for review.
+
+## Review Workflow
+
+### 1. Inspect the build status
+
+Use the Argos CLI with the build URL or build number.
+
+```bash
+argos build get <buildReference> --json
+```
+
+Use the build status to decide the next review step:
+
+| Status             | Review meaning                              | Next step                                      |
+| ------------------ | ------------------------------------------- | ---------------------------------------------- |
+| `accepted`         | Diffs already approved                      | No visual review needed                        |
+| `no-changes`       | No visual diff against baseline             | No visual review needed                        |
+| `pending`          | Build is not ready yet                      | Wait and check again                           |
+| `progress`         | Comparison is still running                 | Wait and check again                           |
+| `changes-detected` | Screenshots need a decision                 | Fetch snapshots and inspect them               |
+| `rejected`         | A prior decision already rejected the build | Do not approve without understanding why       |
+| `error`            | Build processing failed                     | Do not approve until the failure is understood |
+| `aborted`          | Build did not complete normally             | Do not approve until the reason is understood  |
+| `expired`          | Build was not completed in time             | Do not approve until a valid build exists      |
+
+### 2. Fetch only snapshots that need review
+
+```bash
+argos build snapshots <buildReference> --needs-review --json
+```
+
+For each diff, inspect:
+
+- the diff mask `url`
+- the previous screenshot `base.url`
+- the new screenshot `head.url`
+- metadata in `head.metadata`
+
+### 3. Decide whether the change is intentional, broken, or flaky
+
+- Intentional change: the new screenshot matches the code change and the UI is stable.
+- Regression: layout break, wrong state, missing content, wrong theme, clipped content, incorrect data, or an unexpected removed snapshot.
+- Flaky capture: loading indicator, partially rendered content, animation frame, dynamic content drift, or the same transient state captured in multiple browsers.
+
+### 4. Always check for flake signals
+
+Pay special attention to:
+
+- `head.metadata.test.retry > 0`
+- `head.metadata.test.retries` only as context for how many retries the test is allowed to use
+- a visible loader or spinner in the screenshot
+- partially loaded data
+- identical scores across browsers
+- identical `head.url` values across browsers
+
+If the page is clearly captured mid-load, report it as flaky even if the branch name sounds related to loading or data changes.
+
+### 5. Report the result in the PR review
+
+- If everything looks intentional, say the Argos build looks good.
+- If you find a regression, call it out with the Argos build URL and the affected snapshot names.
+- If you find flakiness, explain the signal and recommend a stabilization fix before approval.
+
+Use this format for flaky captures:
+
+> `snapshot-name` looks flaky.
+> Test: `test.title` in `test.location.file:test.location.line`
+> Signal: loader visible / retry > 0 / identical head image across browsers / ...
+> Fix: wait for the page to finish loading. If the SDK uses `waitForAriaBusy`, mark loading UI with `aria-busy="true"` so Argos waits for it to clear
+> Review: `<snapshot url>`
+
+## What To Inspect In Snapshot Data
+
+For each snapshot diff, inspect:
+
+- `name`: human-readable snapshot identifier
+- `status`: usually `changed`, `added`, or `removed`
+- `score`: rough diff magnitude
+- `url`: diff mask / review overlay
+- `base.url`: previous screenshot
+- `head.url`: new screenshot
+- `head.metadata.test`: test file, line, title, `retry`, `retries`
+- `head.metadata.browser`: browser information
+
+## Review Heuristics
+
+### Expected changes
+
+These are usually safe when they match the PR:
+
+- intentional content updates
+- deliberate theme or layout redesigns
+- newly added screens
+- removed screenshots that correspond to intentionally deleted pages or tests
+
+### Regressions
+
+Flag the review when you see:
+
+- missing content
+- wrong empty state
+- unexpected loader
+- broken layout or overlap
+- text clipping
+- wrong route or wrong page captured
+- mismatched theme or styling
+- removed snapshots without an intentional test removal
+
+### Flaky captures
+
+Treat the snapshot as likely flaky when you see:
+
+- a spinner, skeleton, or loading state
+- async content not yet rendered
+- animation mid-transition
+- dynamic values drifting between runs
+- `retry > 0`
+- identical `score` across multiple browsers
+- identical `head.url` across multiple browsers
+
+Identical `head.url` values across browsers are a particularly strong signal that both browsers captured the same transient broken state.
+`retry` is the current retry count for this run. `retries` is only the configured
+retry budget, so do not treat `retries > 0` alone as evidence of flakiness.
+
+## Typical Fixes
+
+- Loading UI visible:
+  wait for settled content before capture; if the SDK uses `waitForAriaBusy`, add `aria-busy="true"` to the loading container so Argos waits for it to clear
+- Test captures too early:
+  wait for content that depends on the async data before calling the screenshot helper
+- Dynamic values:
+  freeze the data or mask the unstable element
+- JS animation:
+  stabilize or remove the animated element from the capture
+
+## Suggested PR Review Wording
+
+Use short, actionable language. Example:
+
+```text
+The Argos build shows a flaky capture rather than an intentional visual change.
+The screenshot is taken before async data has loaded, so the page is captured in
+an intermediate state (spinner visible / empty values / incomplete content).
+Please wait for settled content before taking the screenshot, or mark the
+loading UI with aria-busy="true" if your SDK is configured to wait for it.
+```
+
+## Tooling
+
+- Prefer the Argos CLI for build metadata and snapshot enumeration.
+- Use `--json` when parsing CLI output.
+- Use the PR diff and test code to confirm whether the visual change matches the code.
+- Load [references/baseline.md](references/baseline.md) when the review depends on baseline selection or orphan build semantics.
+- Load [references/flaky-fixes.md](references/flaky-fixes.md) when you need concrete remediation examples for flaky captures.
diff --git a/skills/argos-pr-review/agents/openai.yaml b/skills/argos-pr-review/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Argos PR Review"
+  short_description: "Review Argos builds as part of PR review"
+  default_prompt: "Use $argos-pr-review to review this pull request when an Argos build is available."
diff --git a/skills/argos-pr-review/references/baseline.md b/skills/argos-pr-review/references/baseline.md
@@ -0,0 +1,73 @@
+# Baseline And Orphan Build Notes
+
+Load this file when an Argos PR review depends on baseline selection, orphan
+build semantics, or how approval affects future comparisons.
+
+---
+
+## Baseline Build
+
+The baseline is the screenshot Argos compares the new build against.
+
+**Approval vs. baseline:** approving a build makes it a candidate for future
+baseline selection. It does not directly overwrite the current baseline. The
+baseline is selected dynamically each time a new build runs.
+
+Do not say:
+
+```text
+Approving this build will update the baseline.
+```
+
+Say instead:
+
+```text
+Approving this build would make it eligible to be selected as a baseline for future builds on this branch.
+```
+
+## Baseline Selection Rules
+
+Argos picks the most recent build satisfying all of:
+
+1. Same build name as the triggered build
+2. All framework tests passed
+3. Not a subset build
+4. Status is auto-approved, manually approved, or orphan
+5. Its commit is an ancestor of the merge base between the new build's commit
+   and the baseline branch
+
+## Baseline Branch
+
+- Pull requests: the PR base branch
+- Push events: the project's configured default baseline branch, usually `main`
+
+## Baseline Overrides
+
+- `ARGOS_REFERENCE_BRANCH`: force a specific branch as baseline
+- `ARGOS_REFERENCE_COMMIT`: pin to a specific commit's build
+
+## CI Caveat
+
+If CI does not run on the default branch, the baseline becomes stale and diffs
+accumulate. Keep Argos running on the default branch too.
+
+## Orphan Builds
+
+An orphan build has no prior build to compare against, so all snapshots appear
+as `added` with `base: null`.
+
+This is expected for:
+
+- the first build in a project
+- a branch that has no approved ancestor build yet
+
+The absence of a baseline is not itself a regression signal.
+
+## Review Guidance
+
+When reviewing an orphan build:
+
+- inspect screenshots for obvious layout or rendering problems
+- do not treat every `added` snapshot as suspicious just because `base` is null
+- approve the build if the screenshots look correct and it is intended to seed a
+  future baseline