-
Notifications
You must be signed in to change notification settings - Fork 14
feat(skill): add argos-pr-review #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jsfez
wants to merge
1
commit into
main
Choose a base branch
from
add-pr-review-skill-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| interface: | ||
| display_name: "Argos CLI" | ||
| short_description: "Use the Argos CLI safely and correctly" | ||
| default_prompt: "Use $argos-cli to run or interpret Argos CLI commands in this repository." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,219 @@ | ||
| --- | ||
| name: argos-pr-review | ||
| description: > | ||
| Use Argos visual regression builds as one input to a complete pull request | ||
| review. Use this skill when a PR includes an Argos build URL, an Argos status | ||
| check, or a bot comment linking to an Argos build, and you need to determine | ||
| whether the visual diffs match the developer's likely intent based on the PR | ||
| context, code changes, and screenshots before approving the PR. | ||
| --- | ||
|
|
||
| # Argos PR Review | ||
|
|
||
| Treat the Argos build as one input to the PR review, not as a separate task and | ||
| not as the sole source of truth. | ||
|
|
||
| ## When To Use This Skill | ||
|
|
||
| - A PR includes an Argos build link. | ||
| - A PR has an Argos check with `changes-detected`. | ||
| - An Argos bot comment says screenshots are waiting for review. | ||
|
|
||
| ## Review Principles | ||
|
|
||
| - Start from the PR context, not from the screenshots alone. | ||
| - Infer likely developer intent from the PR title, description, linked issue, | ||
| branch name, commit messages, and code diff before judging the visual result. | ||
| - Use Argos to confirm whether the rendered outcome matches that intent. | ||
| - If the code suggests one intent and the screenshots show another result, treat | ||
| that mismatch as an important review signal. | ||
| - Do not let a clean screenshot override a code-level concern, or let a strange | ||
| screenshot override clear evidence that the change is intentional. | ||
|
|
||
| ## Review Workflow | ||
|
|
||
| ### 1. Inspect the PR context first | ||
|
|
||
| Before opening snapshot diffs, inspect: | ||
|
|
||
| - the PR title and description | ||
| - any linked issue or ticket context | ||
| - the branch name and commit messages when useful | ||
| - the code diff, especially the UI and test files related to the changed area | ||
|
|
||
| Form a working hypothesis for the intended user-facing change before relying on | ||
| the screenshots. | ||
|
|
||
| ### 2. Inspect the build status | ||
|
|
||
| Use the Argos CLI with the build URL or build number. | ||
|
|
||
| ```bash | ||
| argos build get <buildReference> --json | ||
| ``` | ||
|
|
||
| Use the build status to decide the next review step: | ||
|
|
||
| | Status | Review meaning | Next step | | ||
| | ------------------ | ------------------------------------------- | ---------------------------------------------- | | ||
| | `accepted` | Diffs already approved | No visual review needed | | ||
| | `no-changes` | No visual diff against baseline | No visual review needed | | ||
| | `pending` | Build is not ready yet | Wait and check again | | ||
| | `progress` | Comparison is still running | Wait and check again | | ||
| | `changes-detected` | Screenshots need a decision | Fetch snapshots and inspect them | | ||
| | `rejected` | A prior decision already rejected the build | Do not approve without understanding why | | ||
| | `error` | Build processing failed | Do not approve until the failure is understood | | ||
| | `aborted` | Build did not complete normally | Do not approve until the reason is understood | | ||
| | `expired` | Build was not completed in time | Do not approve until a valid build exists | | ||
|
|
||
| ### 3. Fetch only snapshots that need review | ||
|
|
||
| ```bash | ||
| argos build snapshots <buildReference> --needs-review --json | ||
| ``` | ||
|
|
||
| For each diff, inspect: | ||
|
|
||
| - the diff mask `url` | ||
| - the previous screenshot `base.url` | ||
| - the new screenshot `head.url` | ||
| - metadata in `head.metadata` | ||
|
|
||
| ### 4. Compare screenshots against intended change | ||
|
|
||
| For each changed snapshot, ask: | ||
|
|
||
| - does the screenshot match the UI change implied by the PR and code diff? | ||
| - is the changed area the one you would expect from the files touched? | ||
| - does the visual result reveal a regression the code review alone would not catch? | ||
| - does the screenshot contradict the claimed intent in the PR description or | ||
| linked ticket? | ||
|
|
||
| ### 5. Decide whether the change is intentional, broken, or flaky | ||
|
|
||
| - Intentional change: the new screenshot matches the code change and the UI is stable. | ||
| - Regression: layout break, wrong state, missing content, wrong theme, clipped content, incorrect data, or an unexpected removed snapshot. | ||
| - Flaky capture: loading indicator, partially rendered content, animation frame, dynamic content drift, or the same transient state captured in multiple browsers. | ||
|
|
||
| ### 6. Always check for flake signals | ||
|
|
||
| Pay special attention to: | ||
|
|
||
| - `head.metadata.test.retry > 0` | ||
| - `head.metadata.test.retries` only as context for how many retries the test is allowed to use | ||
| - a visible loader or spinner in the screenshot | ||
| - partially loaded data | ||
| - identical scores across browsers | ||
| - identical `head.url` values across browsers | ||
|
|
||
| If the page is clearly captured mid-load, report it as flaky even if the branch name sounds related to loading or data changes. | ||
|
|
||
| ### 7. Report the result in the PR review | ||
|
|
||
| - If everything looks intentional, say the visual result appears consistent with the PR intent. | ||
| - If you find a regression, call it out with the Argos build URL, the affected snapshot names, and the mismatch between the intended change and the rendered result. | ||
| - If you find flakiness, explain the signal and recommend a stabilization fix before approval. | ||
| - When relevant, mention both the code evidence and the screenshot evidence in the same finding. | ||
|
|
||
| Use this format for flaky captures: | ||
|
|
||
| > `snapshot-name` looks flaky. | ||
| > Test: `test.title` in `test.location.file:test.location.line` | ||
| > Signal: loader visible / retry > 0 / identical head image across browsers / ... | ||
| > Fix: wait for the page to finish loading. If the SDK uses `waitForAriaBusy`, mark loading UI with `aria-busy="true"` so Argos waits for it to clear | ||
| > Review: `<snapshot url>` | ||
|
|
||
| ## What To Inspect In Snapshot Data | ||
|
|
||
| For each snapshot diff, inspect: | ||
|
|
||
| - `name`: human-readable snapshot identifier | ||
| - `status`: usually `changed`, `added`, or `removed` | ||
| - `score`: rough diff magnitude | ||
| - `url`: diff mask / review overlay | ||
| - `base.url`: previous screenshot | ||
| - `head.url`: new screenshot | ||
| - `head.metadata.test`: test file, line, title, `retry`, `retries` | ||
| - `head.metadata.browser`: browser information | ||
|
|
||
| ## Review Heuristics | ||
|
|
||
| ### Expected changes | ||
|
|
||
| These are usually safe when they match the PR: | ||
|
|
||
| - intentional content updates | ||
| - deliberate theme or layout redesigns | ||
| - newly added screens | ||
| - removed screenshots that correspond to intentionally deleted pages or tests | ||
|
|
||
| ### Regressions | ||
|
|
||
| Flag the review when you see: | ||
|
|
||
| - missing content | ||
| - wrong empty state | ||
| - unexpected loader | ||
| - broken layout or overlap | ||
| - text clipping | ||
| - wrong route or wrong page captured | ||
| - mismatched theme or styling | ||
| - removed snapshots without an intentional test removal | ||
|
|
||
| ### Flaky captures | ||
|
|
||
| Treat the snapshot as likely flaky when you see: | ||
|
|
||
| - a spinner, skeleton, or loading state | ||
| - async content not yet rendered | ||
| - animation mid-transition | ||
| - dynamic values drifting between runs | ||
| - `retry > 0` | ||
| - identical `score` across multiple browsers | ||
| - identical `head.url` across multiple browsers | ||
|
|
||
| Identical `head.url` values across browsers are a particularly strong signal that both browsers captured the same transient broken state. | ||
| `retry` is the current retry count for this run. `retries` is only the configured | ||
| retry budget, so do not treat `retries > 0` alone as evidence of flakiness. | ||
|
|
||
| ## Typical Fixes | ||
|
|
||
| - Loading UI visible: | ||
| wait for settled content before capture; if the SDK uses `waitForAriaBusy`, add `aria-busy="true"` to the loading container so Argos waits for it to clear | ||
| - Test captures too early: | ||
| wait for content that depends on the async data before calling the screenshot helper | ||
| - Dynamic values: | ||
| freeze the data or mask the unstable element | ||
| - JS animation: | ||
| stabilize or remove the animated element from the capture | ||
|
|
||
| ## Suggested PR Review Wording | ||
|
|
||
| Use short, actionable language. | ||
|
|
||
| Flaky example: | ||
|
|
||
| ```text | ||
| The Argos build shows a flaky capture rather than an intentional visual change. | ||
| The screenshot is taken before async data has loaded, so the page is captured in | ||
| an intermediate state (spinner visible / empty values / incomplete content). | ||
| Please wait for settled content before taking the screenshot, or mark the | ||
| loading UI with aria-busy="true" if your SDK is configured to wait for it. | ||
| ``` | ||
|
|
||
| Intent mismatch example: | ||
|
|
||
| ```text | ||
| The code changes suggest the intent is to update X, but the Argos snapshots show | ||
| Y instead. This looks like an unintended visual regression rather than the | ||
| expected UI outcome of the PR. | ||
| ``` | ||
|
|
||
| ## Tooling | ||
|
|
||
| - Prefer the Argos CLI for build metadata and snapshot enumeration. | ||
| - Use `--json` when parsing CLI output. | ||
| - Read the PR title, description, linked context, and code diff before relying on screenshots. | ||
| - Use the PR diff and test code to confirm whether the visual change matches the code. | ||
| - Load [references/baseline.md](references/baseline.md) when the review depends on baseline selection or orphan build semantics. | ||
| - Load [references/flaky-fixes.md](references/flaky-fixes.md) when you need concrete remediation examples for flaky captures. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| interface: | ||
| display_name: "Argos PR Review" | ||
| short_description: "Review Argos builds as part of PR review" | ||
| default_prompt: "Use $argos-pr-review to review this pull request when an Argos build is available." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Baseline And Orphan Build Notes | ||
|
|
||
| Load this file when an Argos PR review depends on baseline selection, orphan | ||
| build semantics, or how approval affects future comparisons. | ||
|
|
||
| --- | ||
|
|
||
| ## Baseline Build | ||
|
|
||
| The baseline is the screenshot Argos compares the new build against. | ||
|
|
||
| **Approval vs. baseline:** approving a build makes it a candidate for future | ||
| baseline selection. It does not directly overwrite the current baseline. The | ||
| baseline is selected dynamically each time a new build runs. | ||
|
|
||
| Do not say: | ||
|
|
||
| ```text | ||
| Approving this build will update the baseline. | ||
| ``` | ||
|
|
||
| Say instead: | ||
|
|
||
| ```text | ||
| Approving this build would make it eligible to be selected as a baseline for future builds on this branch. | ||
| ``` | ||
|
|
||
| ## Baseline Selection Rules | ||
|
|
||
| Argos picks the most recent build satisfying all of: | ||
|
|
||
| 1. Same build name as the triggered build | ||
| 2. All framework tests passed | ||
| 3. Not a subset build | ||
| 4. Status is auto-approved, manually approved, or orphan | ||
| 5. Its commit is an ancestor of the merge base between the new build's commit | ||
| and the baseline branch | ||
|
|
||
| ## Baseline Branch | ||
|
|
||
| - Pull requests: the PR base branch | ||
| - Push events: the project's configured default baseline branch, usually `main` | ||
|
|
||
| ## Baseline Overrides | ||
|
|
||
| - `ARGOS_REFERENCE_BRANCH`: force a specific branch as baseline | ||
| - `ARGOS_REFERENCE_COMMIT`: pin to a specific commit's build | ||
|
|
||
| ## CI Caveat | ||
|
|
||
| If CI does not run on the default branch, the baseline becomes stale and diffs | ||
| accumulate. Keep Argos running on the default branch too. | ||
|
|
||
| ## Orphan Builds | ||
|
|
||
| An orphan build has no prior build to compare against, so all snapshots appear | ||
| as `added` with `base: null`. | ||
|
|
||
| This is expected for: | ||
|
|
||
| - the first build in a project | ||
| - a branch that has no approved ancestor build yet | ||
|
|
||
| The absence of a baseline is not itself a regression signal. | ||
|
|
||
| ## Review Guidance | ||
|
|
||
| When reviewing an orphan build: | ||
|
|
||
| - inspect screenshots for obvious layout or rendering problems | ||
| - do not treat every `added` snapshot as suspicious just because `base` is null | ||
| - approve the build if the screenshots look correct and it is intended to seed a | ||
| future baseline |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.