Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 35 additions & 17 deletions skills/argos-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
name: argos-cli
description: >
Operate the Argos visual regression platform from the terminal — fetch build
metadata, review snapshot diffs, upload screenshots, and manage CI builds via
the `argos` CLI. Use when the user wants to run Argos commands in the shell,
scripts, or CI/CD pipelines, or when reviewing visual regression builds.
metadata, inspect snapshot diffs, upload screenshots, and manage CI builds
via the `argos` CLI. Use when the user wants to run Argos commands in the
shell, scripts, or CI/CD pipelines.
Always load this skill before running `argos` commands — it contains the flag
contract and output shapes that prevent silent failures.
license: MIT
Expand All @@ -19,14 +19,8 @@ argument-hint: Requires `ARGOS_TOKEN` or an explicit `--token` when running auth

## Agent Protocol

The CLI writes errors to stderr. Some commands support both human-readable text
and JSON output.

**Rules for agents:**

- Supply `--token` or set `ARGOS_TOKEN`. The CLI exits with code 1 if no token is found.
- Exit `0` = success, `1` = error.
- All errors go to stderr: `Error: <message>`
- Exit `0` = success, `1` = error. All errors go to stderr: `Error: <message>`
- Use `--json` whenever stdout will be parsed by a script or agent.

## Authentication
Expand All @@ -43,19 +37,41 @@ Auth resolves: `--token` flag > `ARGOS_TOKEN` env var.
| `finalize` | Finalize a parallel build |
| `skip` | Mark a build as skipped |

Read the matching reference file for detailed flags and output shapes.
## Build Status Reference

## Common Patterns
| Status | Terminal? | Meaning |
| ------------------ | --------- | ----------------------------------- |
| `no-changes` | ✅ | All snapshots identical to baseline |
| `accepted` | ✅ | All changes approved |
| `rejected` | ✅ | Changes explicitly rejected |
| `changes-detected` | ✅ | Diffs found, awaiting decision |
| `error` | ✅ | Build processing failed |
| `aborted` | ✅ | Build was cancelled |
| `expired` | ✅ | Build expired before completion |
| `pending` | ⏳ | Waiting for screenshots to arrive |
| `progress` | ⏳ | Screenshots are being compared |

**Review a build (fetch metadata first, then diffs that need review):**
Only `no-changes` and `accepted` are all-clear terminal states. For review
decision-making, keep the PR-review triage logic in the dedicated Argos review
skill rather than here.

## Snapshot Inspection

Use the CLI to fetch the build and the raw diffs that need attention:

```bash
argos build get 72652
argos build snapshots 72652 --needs-review
argos build get 72652 --json
argos build snapshots 72652 --needs-review --json
argos build get <buildReference> --json
argos build snapshots <buildReference> --needs-review --json
```

This skill is the command-and-output reference for those operations. If you need
the full "review an Argos build as part of a PR review" workflow, keep that in
the dedicated public Argos review skill rather than here.

---

## Common Patterns

**Upload screenshots in CI:**

```bash
Expand All @@ -69,6 +85,8 @@ argos upload ./screenshots --parallel-nonce $CI_PIPELINE_ID --parallel-index $CI
argos finalize --parallel-nonce $CI_PIPELINE_ID
```

---

## When to Load References

- **Fetching build data or reviewing snapshots** → [references/build.md](references/build.md)
Expand Down
4 changes: 4 additions & 0 deletions skills/argos-cli/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Argos CLI"
short_description: "Use the Argos CLI safely and correctly"
default_prompt: "Use $argos-cli to run or interpret Argos CLI commands in this repository."
173 changes: 173 additions & 0 deletions skills/argos-pr-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
name: argos-pr-review
description: >
Review Argos visual regression builds as part of a pull request review. Use
this skill when a PR includes an Argos build URL, an Argos status check, or a
bot comment linking to an Argos build, and you need to decide whether the
visual diffs are intentional changes, regressions, or flaky captures before
approving the PR.
---

# Argos PR Review

Treat the Argos build as part of the PR review, not as a separate task.

## When To Use This Skill

- A PR includes an Argos build link.
- A PR has an Argos check with `changes-detected`.
- An Argos bot comment says screenshots are waiting for review.

## Review Workflow

### 1. Inspect the build status

Use the Argos CLI with the build URL or build number.

```bash
argos build get <buildReference> --json
```

Use the build status to decide the next review step:

| Status | Review meaning | Next step |
| ------------------ | ------------------------------------------- | ---------------------------------------------- |
| `accepted` | Diffs already approved | No visual review needed |
| `no-changes` | No visual diff against baseline | No visual review needed |
| `pending` | Build is not ready yet | Wait and check again |
| `progress` | Comparison is still running | Wait and check again |
| `changes-detected` | Screenshots need a decision | Fetch snapshots and inspect them |
| `rejected` | A prior decision already rejected the build | Do not approve without understanding why |
| `error` | Build processing failed | Do not approve until the failure is understood |
| `aborted` | Build did not complete normally | Do not approve until the reason is understood |
| `expired` | Build was not completed in time | Do not approve until a valid build exists |

### 2. Fetch only snapshots that need review

```bash
argos build snapshots <buildReference> --needs-review --json
```

For each diff, inspect:

- the diff mask `url`
- the previous screenshot `base.url`
- the new screenshot `head.url`
- metadata in `head.metadata`

### 3. Decide whether the change is intentional, broken, or flaky

- Intentional change: the new screenshot matches the code change and the UI is stable.
- Regression: layout break, wrong state, missing content, wrong theme, clipped content, incorrect data, or an unexpected removed snapshot.
- Flaky capture: loading indicator, partially rendered content, animation frame, dynamic content drift, or the same transient state captured in multiple browsers.

### 4. Always check for flake signals

Pay special attention to:

- `head.metadata.test.retry > 0`
- `head.metadata.test.retries` only as context for how many retries the test is allowed to use
- a visible loader or spinner in the screenshot
- partially loaded data
- identical scores across browsers
- identical `head.url` values across browsers

If the page is clearly captured mid-load, report it as flaky even if the branch name sounds related to loading or data changes.

### 5. Report the result in the PR review

- If everything looks intentional, say the Argos build looks good.
- If you find a regression, call it out with the Argos build URL and the affected snapshot names.
- If you find flakiness, explain the signal and recommend a stabilization fix before approval.

Use this format for flaky captures:

> `snapshot-name` looks flaky.
> Test: `test.title` in `test.location.file:test.location.line`
> Signal: loader visible / retry > 0 / identical head image across browsers / ...
> Fix: wait for the page to finish loading. If the SDK uses `waitForAriaBusy`, mark loading UI with `aria-busy="true"` so Argos waits for it to clear
> Review: `<snapshot url>`

## What To Inspect In Snapshot Data

For each snapshot diff, inspect:

- `name`: human-readable snapshot identifier
- `status`: usually `changed`, `added`, or `removed`
- `score`: rough diff magnitude
- `url`: diff mask / review overlay
- `base.url`: previous screenshot
- `head.url`: new screenshot
- `head.metadata.test`: test file, line, title, `retry`, `retries`
- `head.metadata.browser`: browser information

## Review Heuristics

### Expected changes

These are usually safe when they match the PR:

- intentional content updates
- deliberate theme or layout redesigns
- newly added screens
- removed screenshots that correspond to intentionally deleted pages or tests

### Regressions

Flag the review when you see:

- missing content
- wrong empty state
- unexpected loader
- broken layout or overlap
- text clipping
- wrong route or wrong page captured
- mismatched theme or styling
- removed snapshots without an intentional test removal

### Flaky captures

Treat the snapshot as likely flaky when you see:

- a spinner, skeleton, or loading state
- async content not yet rendered
- animation mid-transition
- dynamic values drifting between runs
- `retry > 0`
- identical `score` across multiple browsers
- identical `head.url` across multiple browsers

Identical `head.url` values across browsers are a particularly strong signal that both browsers captured the same transient broken state.
`retry` is the current retry count for this run. `retries` is only the configured
retry budget, so do not treat `retries > 0` alone as evidence of flakiness.

## Typical Fixes

- Loading UI visible:
wait for settled content before capture; if the SDK uses `waitForAriaBusy`, add `aria-busy="true"` to the loading container so Argos waits for it to clear
- Test captures too early:
wait for content that depends on the async data before calling the screenshot helper
- Dynamic values:
freeze the data or mask the unstable element
- JS animation:
stabilize or remove the animated element from the capture

## Suggested PR Review Wording

Use short, actionable language. Example:

```text
The Argos build shows a flaky capture rather than an intentional visual change.
The screenshot is taken before async data has loaded, so the page is captured in
an intermediate state (spinner visible / empty values / incomplete content).
Please wait for settled content before taking the screenshot, or mark the
loading UI with aria-busy="true" if your SDK is configured to wait for it.
```

## Tooling

- Prefer the Argos CLI for build metadata and snapshot enumeration.
- Use `--json` when parsing CLI output.
- Use the PR diff and test code to confirm whether the visual change matches the code.
- Load [references/baseline.md](references/baseline.md) when the review depends on baseline selection or orphan build semantics.
- Load [references/flaky-fixes.md](references/flaky-fixes.md) when you need concrete remediation examples for flaky captures.
4 changes: 4 additions & 0 deletions skills/argos-pr-review/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Argos PR Review"
short_description: "Review Argos builds as part of PR review"
default_prompt: "Use $argos-pr-review to review this pull request when an Argos build is available."
73 changes: 73 additions & 0 deletions skills/argos-pr-review/references/baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Baseline And Orphan Build Notes

Load this file when an Argos PR review depends on baseline selection, orphan
build semantics, or how approval affects future comparisons.

---

## Baseline Build

The baseline is the screenshot Argos compares the new build against.

**Approval vs. baseline:** approving a build makes it a candidate for future
baseline selection. It does not directly overwrite the current baseline. The
baseline is selected dynamically each time a new build runs.

Do not say:

```text
Approving this build will update the baseline.
```

Say instead:

```text
Approving this build would make it eligible to be selected as a baseline for future builds on this branch.
```

## Baseline Selection Rules

Argos picks the most recent build satisfying all of:

1. Same build name as the triggered build
2. All framework tests passed
3. Not a subset build
4. Status is auto-approved, manually approved, or orphan
5. Its commit is an ancestor of the merge base between the new build's commit
and the baseline branch

## Baseline Branch

- Pull requests: the PR base branch
- Push events: the project's configured default baseline branch, usually `main`

## Baseline Overrides

- `ARGOS_REFERENCE_BRANCH`: force a specific branch as baseline
- `ARGOS_REFERENCE_COMMIT`: pin to a specific commit's build

## CI Caveat

If CI does not run on the default branch, the baseline becomes stale and diffs
accumulate. Keep Argos running on the default branch too.

## Orphan Builds

An orphan build has no prior build to compare against, so all snapshots appear
as `added` with `base: null`.

This is expected for:

- the first build in a project
- a branch that has no approved ancestor build yet

The absence of a baseline is not itself a regression signal.

## Review Guidance

When reviewing an orphan build:

- inspect screenshots for obvious layout or rendering problems
- do not treat every `added` snapshot as suspicious just because `base` is null
- approve the build if the screenshots look correct and it is intended to seed a
future baseline
Loading
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.