GitHub - voidmatcha/e2e-skills: AI agent testing toolkit for Playwright and Cypress: generate E2E tests from scratch, review existing specs against 24 anti-patterns (P0/P1/P2 silent-always-pass smells), and debug flaky failures from playwright-report/ or cypress/reports/. Agent Skills for Claude Code and Codex.

e2e-skills — Agent skills for Playwright and Cypress: generate, review, and debug reliable end-to-end tests.

Tip

8 PRs already merged into Cal.com, Storybook, SvelteKit, Ghost and more: real-world proof the skill flags bugs that maintainers agree are worth fixing. If that is useful, star the repo and follow @voidmatcha for more agent skills.

Find Playwright and Cypress E2E tests that pass CI but prove nothing, generate new end-to-end coverage, and turn failing test reports into root-cause fixes — as Agent Skills for Claude Code, Codex, and 55+ other AI coding agents (any AGENTS.md-compatible host, via the skills CLI).

A green check is not proof. Some E2E tests stay green whether the feature works or not:

- expect(page.getByText('SWE')).toBeDefined();        // a Locator is never undefined, so this always passes
+ await expect(page.getByText('SWE')).toBeVisible();  // now the assertion can actually fail

e2e-skills is an AI-agent testing toolkit for Playwright and Cypress that catches what CI misses: tests that pass but prove nothing, and failures that are hard to trace. It runs as an Agent Skills bundle for Claude Code, Codex, and 55+ other agents via the skills CLI, by @voidmatcha. Four skills cover the full lifecycle:

playwright-test-generator — generates Playwright E2E tests from scratch, from coverage gap analysis to passing, reviewed tests
e2e-reviewer — static analysis of existing Playwright and Cypress specs; flags 24 anti-patterns (P0 silent always-pass, P1 poor diagnostics, P2 maintenance) that can make tests pass CI while missing real regressions
playwright-debugger — diagnoses failures from playwright-report/ and classifies root causes (flaky timing, selector drift, auth, environment mismatch, and more)
cypress-debugger — same for Cypress report files

Why I built this

Most of my E2E tests are written by AI agents now, and that is where the trouble started. The agent would hand me a test that passed and proved nothing: an assertion on a locator that is always truthy, a toBeDefined() that can never fail, a delete test that never checks the row was actually gone. The suite was green, so I trusted it, and the bug shipped anyway.

The other half was cruft. Ask a model to write a spec and it cheerfully over-builds: a Page Object it never uses, a helper that drifts out of sync with the spec it was meant to serve, an abstraction for a case that never comes. YAGNI and KISS are the first things to go. And once several different agents were writing the tests, it got worse, because each model had its own idea of what a spec should look like.

e2e-skills is the thing I wanted while dealing with all of that: a stable, opinionated catalog that flags the tests that lie and the abstractions that rot, whoever or whatever wrote them.

Install

# Claude Code + Codex (most common)
npx skills add voidmatcha/e2e-skills --skill '*' -g -a claude-code -a codex

# Codex plugin — via the skills CLI (Codex only)
npx skills add voidmatcha/e2e-skills --skill '*' -g -a codex

# Claude Code plugin — marketplace
/plugin marketplace add voidmatcha/e2e-skills
/plugin install e2e-skills@voidmatcha

# Claude Code plugin — manual clone
git clone https://github.com/voidmatcha/e2e-skills.git ~/.claude/skills/e2e-skills

# Every agent the skills CLI supports (55+ hosts) — --all = --skill '*' --agent '*' -y
npx skills add voidmatcha/e2e-skills -g --all

The Codex command above is the Codex plugin install: the skills CLI places the bundle in ~/.agents/skills/, where Codex auto-discovers it and reads .codex-plugin/plugin.json (the interface block). There is no separate codex plugin marketplace add step — a native marketplace entry would require duplicating the shared skills/ tree into a per-plugin subdirectory (Codex marketplace plugins cannot reference the repo root, openai/codex#17066), so the CLI route is the supported Codex path.

Quick Example

You: Review my Playwright tests in apps/viewer/src/test/

e2e-reviewer:

  [P0] settings.spec.ts:88, 99 — #4h One-shot URL read
    expect(page.url()).toEqual(`${baseURL}/${id}-public`);   // sync read, no auto-retry
    → fix: await expect(page).toHaveURL(`${baseURL}/${id}-public`);
    (also removes redundant `await page.waitForTimeout(1000)` above)

  [P0] fileUpload.spec.ts:67 — #16 Missing await on action
    page.getByRole('button', { name: 'Delete' }).click();   // fire-and-forget, races next line
    → fix: await page.getByRole('button', { name: 'Delete' }).click();

  Total: 3 P0 (2 #4h, 1 #16), 0 P1, 0 P2 in 24 spec files.
  P1/P2 candidates (not yet flagged as bugs): 20× positional .nth() selectors, 5× direct page.click(selector).

Real findings from a recent typebot.io scan — silent always-pass bugs your test suite was hiding.

Workflow

Run playwright-test-generator → generate with approval → auto-reviewed by e2e-reviewer
Generated tests fail → playwright-debugger invoked automatically after 3 fix attempts
Existing tests: e2e-reviewer → fix → re-run
Tests fail → playwright-debugger or cypress-debugger → fix → re-run

Standalone Scanner

./skills/e2e-reviewer/scripts/scan.sh path/to/tests

Three tiers run in priority order: (1) eslint-plugin-playwright / eslint-plugin-cypress — uses your local install if present, otherwise auto-downloads via npx --yes (set E2E_SMELL_NO_ESLINT_DOWNLOAD=1 to disable); (2) ast-grep Tree-sitter rules for FP-prone patterns — uses ast-grep / sg on PATH if present, otherwise auto-downloads via npx --yes @ast-grep/cli (set E2E_SMELL_NO_AST_GREP_DOWNLOAD=1 to disable); (3) bundled regex coverage for grep-detectable P0/P1/P2 patterns and gaps the lint plugins miss — Cypress cy.on('uncaught:exception', () => false) blanket suppression (#3b), {timeout:0}.should("not.exist") (#4g), and cross-framework heuristics. See docs/e2e-test-smells.md for the full P0/P1/P2 model. Use // JUSTIFIED: <reason> on (or in the comment block directly above) an intentional pattern to suppress it in the bundled scanner output; the eslint tier does not parse JUSTIFIED markers — pair with an eslint-disable comment there if needed. The eslint tier also runs under a hang watchdog (E2E_SMELL_ESLINT_TIMEOUT_SECS, default 300s) and never blocks Tier 2/3 coverage when it fails.

The e2e-reviewer skill adds what no lint can reach: semantic checks (name-assertion mismatch, missing Then, YAGNI/zombie specs, POM consistency, auth setup analysis) and fix guidance with band-aid awareness. Run eslint-plugin-playwright / eslint-plugin-cypress as your every-commit baseline; invoke the skill for PR review, suspected silent-pass bugs, or before bulk fixes.

Scanner findings are candidates, not verdicts

Static detection (lint, ast-grep, regex) can only flag candidates. It cannot tell a real silent-always-pass from a harmless one, and it both over- and under-reports:

False positives — an expect() inside an awaited Promise.all([...]) (the assertion is awaited), or a smell inside a test.fixme/describe.skip block that never runs, both look like bugs to a grep but ship nothing.
Blind spots — expect(page.getByRole('alert')).toBeTruthy() asserts an always-truthy Locator object (the DOM is never queried), yet no off-the-shelf lint rule flags a bare locator-as-truthy.
Beyond any lint — a test that wraps addEdge() in try/catch and asserts only inside the catch passes forever, because addEdge never throws (it calls onError and returns the list unchanged), so the catch body never runs. No grep or AST rule can know a function doesn't throw — only reading the code does. (Real case: xyflow graph-utils.cy.ts.)

That gap is the whole point of the judgment layer. e2e-reviewer verifies each finding — reading the surrounding code, the CI config, and the test's intent to decide whether a real bug actually ships green and whether retries could mask it — instead of dumping raw hits. Scanner-positive is not the same as merge-worthy: every one of the merged PRs below survived that verification, and several flagged-but-backstopped candidates were correctly rejected before they ever became a PR. Detection is cheap and commoditised; the verification and the fix are where the value is.

Proven in Open Source

The sharpest case: in code-server, a committed it.only had silently skipped 8 tests for 7 months, and one of them had already broken. CI stayed green the whole time. That is what a silent always-pass costs, and why a passing suite is not the same as a tested one. e2e-reviewer found it; the maintainers merged the fix.

Eight real merged PRs, not synthetic examples:

Repository	Merged PR	What it fixed
Cal.com	calcom/cal.diy#28486	False-passing Playwright assertions, no-op state checks, hard-coded waits → web-first assertions + condition waits
Storybook	storybookjs/storybook#34141	Unawaited Playwright actions and discarded `isVisible()` calls that made E2E checks silently weak
Element Web	element-hq/element-web#32801	Always-passing assertions, unawaited checks, `toBeAttached()` misuse, debugging leftovers
code-server	coder/code-server#7845	An `it.only` leak that silently skipped 8 Heart unit tests for 7 months (one had since broken), 4× matcher-less `expect()`, a dangling locator, and 16× one-shot `page.isVisible()` reads → web-first assertions
Ghost	TryGhost/Ghost#28712	`expect(likeButton.isDisabled()).toBeTruthy()` ×3 — an un-awaited `isDisabled()` Promise is always truthy, so the comments-ui like-button debounce guards passed unconditionally → web-first `toBeDisabled()`/`not.toBeDisabled()`
SvelteKit	sveltejs/kit#16068	Unawaited `expect(page)` web-first assertions in the basics client tests — floating promises that never asserted → awaited
Strapi	strapi/strapi#26630	Discarded `isVisible()`/`isHidden()` reads that were the sole assertion of each visibility test → web-first `toBeVisible()`/`toBeHidden()`
bruno	usebruno/bruno#8317	Missing `await` on the sole WebSocket visibility assertion — a floating Promise that never ran → awaited so the check actually executes

Two more are maintainer-approved and awaiting merge (QwikDev/qwik#8727, module-federation/core#4826).

Note

Benchmark (secondary evidence). Across 100 bot-reviewed PRs in 77 repositories, e2e-reviewer had the best recall (78/110, 71%) with zero false positives, and uniquely caught 47 silent always-pass issues that the linters and the AI reviewers missed. The original judge shared a model family with the reviewer (an affinity-bias risk), so the contestable unique catches were re-judged by an independent cross-model judge (OpenAI gpt-5.5 via Codex), which agreed on 13 of 15 (87%) — the headline holds directionally rather than collapsing under a different judge. Read it as directional, not a leaderboard. Full method and limitations: AI-reviewer benchmark.

Beyond those merged PRs, the skill was iterated and validated against 100+ open-source Playwright and Cypress test suites (many 1k+ stars) in a local testbed — zero GitHub side effects, no forks or PRs opened during research. Real findings from those scans drove concrete rule changes: the 4.4 cycle-count rule, the 4.2 PR-culture cross-check, the Phase 2 retry-wrapper skip, the legacy cypress/integration/**/*.js glob coverage, and the awaited-locator (expect(await locator)) variant of the missing-await check all came from observed agent behavior and real anti-patterns surfaced across those runs. See upstream contributions for the full track record and roadmap (merged, in-review, and queued PRs, with before/after lessons on the merged ones).

Recognized problem, not a niche opinion

The merged PRs are the empirical case: real silent always-pass tests that shipped green until the skill found them and a maintainer agreed the fix was warranted. The class of bug they fix is also well recognized by the frameworks themselves and by the testing literature:

Playwright's own best-practices doc warns that with an un-awaited assertion "the test won't wait a single second, it will just check the locator is there and return immediately." (playwright.dev/docs/best-practices)
eslint-plugin-playwright ships a no-conditional-expect rule precisely because, with an assertion inside conditional code, "tests can end up passing but not actually test anything." (rule docs)
Tests that pass without truly asserting are a named test smell going back to van Deursen et al., Refactoring Test Code (2001), and catalogued since as "tests without assertions" and "tautological assertions" that can never fail.
Untrusted suites are widespread and corrosive: Google reported "almost 16% of our tests have some level of flakiness," and that "it is human nature to ignore alarms when there is a history of false signals." (Google Testing Blog)

Playwright and Cypress together draw tens of millions of weekly npm installs, so the surface for these bugs is large. (We deliberately do not cite the popular "a bug costs 100x more in production" figure — its provenance is disputed; the point stands without it.)

Skill 1: `playwright-test-generator` — Test Generation

Generates Playwright E2E tests from scratch for any project. Starts from coverage gap analysis, explores the live app via agent-browser tools, designs scenarios with your approval, and auto-reviews generated tests with e2e-reviewer.

When to Use

You have a page or feature with no E2E coverage
You want to bootstrap a test suite for an existing app
You need to quickly add tests before a release

Usage

Generate playwright tests
Generate playwright tests for the login page
Write e2e tests for the settings page
Add playwright coverage for checkout flow

Pipeline

Detect environment — config, baseURL, test dir, POM structure, existing conventions doc
Coverage gap analysis — user picks target (skipped when target given as argument)
Live browser exploration — via agent-browser tools (no hallucinated selectors); accessible-name reality check for label-less inputs
Scenario design + approval gate — shows plan and locator table before any code
Code generation — POM + spec or flat spec, auto-detected from project conventions; writes must be route-stubbed (see Network Determinism in code-rules.md)
Conventions & seed scaffolding (first run on a project) — appends a project-adapted E2E section to AGENTS.md and designates a seed spec, so future AI-generated tests (Claude Code, Codex, Playwright Agents) stay consistent
YAGNI audit + e2e-reviewer — removes unused locators, catches P0 issues before first run
TS compile + test run — 3 auto-fix attempts on failure (heal-by-intent locator re-resolution), then hands off to playwright-debugger

Skill 2: `e2e-reviewer` — Quality Review

Catches issues in E2E tests that pass CI but fail to catch real regressions.

When to Use

Your tests always pass but bugs still slip through to production
Tests pass CI but you suspect they miss real regressions
Your test suite is fragile — tests break on every UI change
You want to audit test quality before a release or code review
You're reviewing Playwright or Cypress specs

Usage

Review my E2E tests
Audit the spec files in tests/
Find weak tests in my test suite
My tests always pass but miss bugs
Tests pass CI but miss regressions
My tests are fragile and break on every UI change
We have coverage but bugs still slip through

24 Patterns Detected — Grouped by Severity

P0 — Must Fix (silent always-pass)

Tests pass when the feature is broken. No real verification is happening.

#	Pattern	Before	After
1	Name-assertion mismatch	Name says "status" but only checks `toBeVisible()`	Add assertion for status content, or rename to match actual check
2	Missing Then	Cancel action, verify text restored — but input still visible?	Verify both restored state and dismissed state
3	Error swallowing	`try/catch` in spec, `.catch(() => {})` in POM	Let errors fail; remove silent catch from POM methods
3b	Cypress `uncaught:exception` suppression	`cy.on('uncaught:exception', () => false)` blanket-swallows app errors	Scope handler to specific known errors; re-throw unknown errors
4	Always-passing assertion	`toBeGreaterThanOrEqual(0)`; `toBeAttached()` with no comment; `expect(await el.isVisible()).toBe(true)` (one-shot); `expect(await el.textContent()).toBe(x)` (one-shot); `expect(locator).toBeTruthy()` (Locator always truthy); `{ timeout: 0 }` on assertions (disables retry)	`toBeGreaterThan(0)`; `toBeVisible()`; web-first assertions with auto-retry
5	Bypass patterns (5a P0, 5b P1)	`if (await el.isVisible()) { expect(...) }`; `{ force: true }` without comment	Always assert; move env checks to `beforeEach`; add `// JUSTIFIED:` to force:true
7	Focused test leak	`test.only(...)` committed — CI runs one test, silently skips the rest	Delete `.only`; use `--grep` or `--spec` for local focus
8	Missing assertion	`await page.locator('.x');` (discarded); `await el.isVisible();` (boolean thrown away)	Add `await expect(locator).toBeVisible()` or delete the line
12	Missing auth setup	Protected-route spec navigates to `/dashboard` with no login/`storageState`/auth fixture	Add `beforeEach` login, configure `storageState`, or use auth fixture — otherwise test passes against the login page
15	Missing `await` on `expect()`	`expect(page.locator('.toast')).toBeVisible()` returns an unobserved Promise	Add `await` so the assertion actually runs
16	Missing `await` on action	`page.locator('#submit').click()` may not execute before the next line	Add `await` so the action completes

P1 — Should Fix (poor diagnostics / wastes CI time)

Tests work but mislead developers, waste CI time, or set up future regressions.

#	Pattern	Before	After
6	Raw DOM queries	`document.querySelector` in `evaluate()`	Use framework locator/query APIs (`locator` / `cy.get`)
9	Hard-coded sleep	`waitForTimeout(2000)` / `cy.wait(2000)` / `waitForLoadState('networkidle')`	Rely on framework auto-wait; use condition-based waits
10	Flaky test patterns	`items.nth(2)` without comment; `test.describe.serial()`	Use `data-testid` or role selectors; replace serial with self-contained tests
13	Inconsistent POM usage	POM imported but spec uses raw `page.fill`/`page.click` for POM-owned actions	Route all interactions through the POM so UI changes update in one place
14	Hardcoded credentials	`loginPage.login('demo-admin', '<literal-password>')` in test code	Use `process.env.TEST_USER`, Playwright config secrets, or test data fixtures
17	Direct `page.click(selector)` API	`page.click('#submit')` / `page.fill('#input', 'text')` skips the Locator layer	Use `page.locator(selector).click()` for auto-wait and better error messages
18	`expect.soft()` overuse	All assertions in a test are `expect.soft()` — test never fails early	Ensure at least one hard `expect()` gates per test; use `soft` only for independent details
19	Module-level mutable state in test code	`let testNotebookSequence = 0;` at column 0 in a test utility — collides across parallel workers and survives retries	Drop the counter; derive uniqueness from `Date.now()` + `Math.random().toString(36).slice(2, 8)`, or move state into `test.beforeEach`
20	Unmocked real-backend writes	Signup/checkout spec submits real mutations — every CI run creates real accounts/orders	Stub write/credential endpoints with `page.route()` / `cy.intercept()`; one designated real-backend smoke spec max
22	Optimistic UI without call proof	Like-toggle test asserts `aria-pressed` flip — UI updates optimistically, passes with the POST deleted	Pair UI assertion with `page.waitForRequest()` (armed before the click) or a route-hit flag

P2 — Nice to Fix (maintenance / robustness)

Weak but not wrong — addressed when refactoring.

#	Pattern	Before	After
11	YAGNI + Zombie Specs	`clickEdit()` never called; empty wrapper class; single-use Util; entire spec duplicated by another	Delete unused members; inline single-use Util methods; delete zombie spec files
21	Manually-captured session-file dependency	`storageState: 'auth/member.json'` produced only by a manual capture script — absent on CI, silently expires	Regenerate session programmatically (API-login helper or `setup` project); manual files only as a cache with a programmatic fallback
23	Fixture ignores render guards	Liked-tab fixture seeds `liked: false`; the card component `return null`s every item — empty UI looks like infra flake	Read the item component's early returns/filters before seeding; seed fields to pass every guard for the view under test

What a linter structurally cannot catch

A linter checks that an assertion is well-formed. It cannot check that the test proves what its name claims. That gap, between a test's stated intent and what it actually verifies, is the core of what e2e-reviewer looks for, and it is invisible to any per-file AST or grep rule: should show an error when the name is duplicate can pass with an assertion that never touches the error, and the syntax is flawless. Deciding it needs the test's name, the action it performs, and the surrounding code read together, which is a level above where a single-file rule operates.

e2e-reviewer runs eslint-plugin-playwright / eslint-plugin-cypress as its first tier, so the mechanical rules (#6, #7, #9, #15, #16, #5a, #5b) are already covered by the de-facto-standard plugins. The reason to add e2e-reviewer on top is the smells no AST or grep rule can reach, because confirming them requires reading code the rule never sees — other functions, the component, the CI config, the test's own intent:

Smell	Why lint cannot decide it
`#1` Name-assertion mismatch	Needs to compare the test's name/intent against what it actually asserts. Syntactically the assertion is fine.
`#3` / `#3b` Error swallowing & blanket `cy.on('uncaught:exception', () => false)`	Valid syntax; only intent reveals it disables failure. A single-line regex missed 51 multi-line instances in one suite.
`#4f` Locator-as-truthy (`expect(locator).toBeTruthy()` / `.toBeDefined()` / `.not.toBeNull()`)	Reads as a normal assertion. You must know a Locator is never falsy to see it always passes.
`#4` One-shot reads (`expect(await el.isVisible()).toBe(true)`)	A valid `expect`; only knowing it is a non-retrying point-in-time read marks it as an anti-pattern.
`#12` Missing auth setup	Requires cross-file reasoning over config, fixtures, and `storageState` to know the route is unauthenticated.
`#20` / `#22` Unmocked writes / optimistic-UI without call proof	Requires knowing an endpoint mutates, or that the UI updates optimistically with no network assertion behind it.
`#11` / `#23` Zombie specs / fixture ignores render guards	Cross-file: duplicate-spec detection, or reading a component's early `return null` before trusting a seed.
The hard case	A `try/catch` wrapping a function that never throws, asserting only inside `catch` (real case: `addEdge` in xyflow's `graph-utils.cy.ts`). Confirming it means reading the function body in another file — impossible for grep or any single-file AST rule.

This is the part that needs judgment, not a pattern match. e2e-reviewer reads the surrounding code and CI config to verify each candidate before it becomes a finding — the candidates-not-verdicts discipline above — which is also why every finding ships with a band-aid-aware fix rather than a raw match.

References

Playwright best practices · Cypress best practices · Testing Library guiding principles

Skill 3: `playwright-debugger` — Playwright Failure Debugger

Diagnoses Playwright test failures from a playwright-report/ directory — whether failures happened locally or in CI. Classifies root causes and provides concrete fixes.

When to Use

You have a playwright-report/ directory (local or downloaded from CI) with failures to understand
Tests pass locally but fail in CI
You're dealing with flaky or intermittent test failures
You get TimeoutError or locator not found without a clear cause

Usage

Debug these failing tests
Why did these tests fail?
Tests pass locally but fail in CI

Note: Provide the report as a local path. Download CI artifacts manually from GitHub Actions and pass the directory path — automatic artifact fetching is not supported.

15 Root Cause Categories

#	Category	Signals
F1	Flaky / Timing	`TimeoutError`, passes on retry
F2	Selector Broken	`locator not found`, strict mode violation
F3	Network Dependency	`net::ERR_*`, unexpected API response
F4	Assertion Mismatch	`Expected X to equal Y`, subject-inversion
F5	Missing Then	Action completed but wrong state remains
F6	Condition Branch Missing	Element conditionally present, assertion always runs
F7	Test Isolation Failure	Passes alone, fails in suite
F8	Environment Mismatch	CI vs local only; viewport, OS, timezone
F9	Data Dependency	Missing seed data, hardcoded IDs
F10	Auth / Session	Session expired, role-based UI not rendered
F11	Async Order Assumption	`Promise.all` order, parallel race
F12	POM / Locator Drift	DOM structure changed, POM not updated
F13	Error Swallowing	`.catch(() => {})` hiding actual failure
F14	Animation Race	Content not yet rendered, or a transient element removed before it is observed
F15	Hydration Race	Action succeeds but has no effect — SSR page not yet hydrated; fails at the next assertion

Debug Workflow

Extract — parse results.json for failed tests, error messages, duration
Classify — map each failure to F1–F15 using error signals (most failures resolved here)
Trace — if still unclear, extract trace.zip and inspect step-by-step: failed actions, DOM snapshots, network errors, JS console errors
Fix — concrete code suggestion per failure, P0/P1/P2 priority

Skill 4: `cypress-debugger` — Cypress Failure Debugger

Diagnoses Cypress test failures from mochawesome or JUnit report files. Classifies root causes and provides concrete fixes.

When to Use

You have a cypress/reports/ directory (local or downloaded from CI) with failures to understand
Cypress tests pass locally but fail in CI
You're dealing with flaky or intermittent Cypress failures
You get Timed out retrying or Expected to find element without a clear cause

Usage

Debug these failing Cypress tests
Why did these Cypress tests fail?
Analyze cypress/reports/
Cypress tests pass locally but fail in CI

15 Root Cause Categories

#	Category	Signals
F1	Flaky / Timing	`Timed out retrying`, passes on retry
F2	Selector Broken	`Expected to find element`, `cy.get() failed`
F3	Network Dependency	`cy.intercept()` not matched, `XHR failed`
F4	Assertion Mismatch	`expected X to equal Y`, `AssertionError`
F5	Missing Then	Action completed but wrong state remains
F6	Condition Branch Missing	Element conditionally present, assertion always runs
F7	Test Isolation Failure	Passes alone, fails in suite
F8	Environment Mismatch	CI vs local only; baseUrl, viewport, OS
F9	Data Dependency	Missing seed data, `cy.fixture()` mismatch
F10	Auth / Session	`cy.session()` expired, role-based UI not rendered
F11	Command Queue / Intercept Race	`cy.intercept` registered after request fires; `.then()` chain order swap; parallel `cy.request()` race against an unfinished `cy.visit()`
F12	Selector Drift	DOM changed, custom command or POM selector not updated
F13	Error Swallowing	`cy.on('uncaught:exception', () => false)` hiding failures
F14	Animation Race	Content not yet rendered, a transient element removed before observed, or CSS transition not complete
F15	Hydration Race	First click after `cy.visit()` succeeds but has no effect — SSR page not yet hydrated; fails at the next assertion

Debug Workflow

Extract — parse mochawesome.json or JUnit XML for failed tests, error messages, duration
Classify — map each failure to F1–F15 using error signals (most failures resolved here)
Screenshot/Video — if still unclear, inspect cypress/screenshots/ and cypress/videos/
Fix — concrete code suggestion per failure, P0/P1/P2 priority

FAQ

What is e2e-skills?

e2e-skills is an open-source AI agent testing toolkit for Playwright and Cypress. It bundles four Agent Skills that generate end-to-end tests, review existing specs for silent always-pass anti-patterns, and debug flaky failures — running inside Claude Code, Codex, and other AGENTS.md-compatible AI coding agents.

How do I find Playwright or Cypress tests that pass but don't actually test anything?

Run the e2e-reviewer skill (or its standalone scanner, scan.sh) against your spec directory. It flags 24 anti-patterns grouped by severity (P0/P1/P2) — including missing await on assertions, one-shot isVisible() reads, matcher-less expect(), and committed .only leaks — that let a test stay green while the feature it covers is broken.

How is this different from eslint-plugin-playwright or eslint-plugin-cypress?

The eslint plugins are your every-commit baseline for syntactic rules, and the scanner runs them first (Tier 1) — so it does not replace them, it adds a layer on top. The layer is the smells a linter structurally cannot decide: a name-assertion mismatch, a try/catch around a function that never throws, an expect(locator).toBeTruthy() that is always true, a missing-auth route — each needs reading code the AST rule never sees (another function, the component, the CI config, the test's intent). e2e-reviewer reads that surrounding code to verify the finding and ships a band-aid-aware fix, where lint can only flag single-file syntax.

Isn't this just an AI code reviewer like CodeRabbit, Copilot, or Cursor BugBot?

Those are excellent general reviewers — several are free for open source and now run locally (CodeRabbit's CLI reviews staged changes in the terminal). The difference is specialization, not capability: a general reviewer reasons over whatever diff it is handed, while e2e-reviewer carries a curated, stable, severity-graded catalog of E2E silent always-pass anti-patterns (24 patterns with fixed IDs, plus 15 failure-debugging categories) and runs on demand against a whole spec directory, not only a PR diff. Use a general reviewer for everything; use this when E2E test trustworthiness is the thing you care about. For a real head-to-head on 100 reviewed PRs (with honest limitations), see the AI-reviewer benchmark.

Does it work with Cypress as well as Playwright?

Yes. Both are first-class: test generation and the richest review target Playwright, while review and failure debugging fully cover Cypress (mochawesome and JUnit reports).

Can it debug flaky tests that only fail in CI?

Yes. playwright-debugger and cypress-debugger read your report files (playwright-report/, cypress/reports/) and classify each failure into 15 root-cause categories — flaky timing, selector drift, test isolation, environment mismatch, hydration race, and more — with a concrete fix per failure.

How do I review AI-generated E2E tests?

Point e2e-reviewer at the generated specs. AI-written tests frequently contain confident-looking but silent always-pass assertions; the reviewer surfaces them with before/after fixes before they reach your main branch.

Which AI coding agents are supported?

Claude Code (plugin marketplace or the skills CLI), Codex, and any agent the skills CLI supports via AGENTS.md (55+ hosts). Install once, use everywhere.

Does it support test frameworks other than Playwright and Cypress?

No — Playwright and Cypress only, by design. See framework scope for the rationale.

Roadmap

Planned, not yet shipped (these describe direction, not current behavior):

Cross-model consistency. Different AI agents each write specs in their own style, so a suite built with several models drifts into a patchwork no single convention holds together. The plan: infer your project's conventions (POM shape, locator strategy, fixture and structure patterns), ask you only where the codebase is genuinely ambiguous, and persist the answers so every model conforms afterward. Crucially, the recorded conventions stay a default the agent can deviate from with a stated reason, not a hard rule, so a better approach for a specific test is never blocked — and a justified deviation becomes a prompt to evolve the convention. This is the part a linter structurally cannot do: it enforces fixed rules; it cannot learn and conform to your conventions.
Deterministic detection layer. Move the per-file, type-decidable smells (locator-as-truthy, floating assertions) from prompt-and-heuristic onto a type-aware AST pass, so detection is reproducible and the LLM is reserved for the judgment calls a single-file rule cannot make. The clearly lint-able rules would be contributed upstream to eslint-plugin-playwright rather than re-implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
.github		.github
assets		assets
docs		docs
scripts		scripts
skills		skills
.codexignore		.codexignore
.gitignore		.gitignore
.plugin-scanner.toml		.plugin-scanner.toml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

Why I built this

Contents

Install

Quick Example

Workflow

Standalone Scanner

Scanner findings are candidates, not verdicts

Proven in Open Source

Recognized problem, not a niche opinion

Skill 1: playwright-test-generator — Test Generation

When to Use

Usage

Pipeline

Skill 2: e2e-reviewer — Quality Review

When to Use

Usage

24 Patterns Detected — Grouped by Severity

P0 — Must Fix (silent always-pass)

P1 — Should Fix (poor diagnostics / wastes CI time)

P2 — Nice to Fix (maintenance / robustness)

What a linter structurally cannot catch

References

Skill 3: playwright-debugger — Playwright Failure Debugger

When to Use

Usage

15 Root Cause Categories

Debug Workflow

Skill 4: cypress-debugger — Cypress Failure Debugger

When to Use

Usage

15 Root Cause Categories

Debug Workflow

FAQ

What is e2e-skills?

How do I find Playwright or Cypress tests that pass but don't actually test anything?

How is this different from eslint-plugin-playwright or eslint-plugin-cypress?

Isn't this just an AI code reviewer like CodeRabbit, Copilot, or Cursor BugBot?

Does it work with Cypress as well as Playwright?

Can it debug flaky tests that only fail in CI?

How do I review AI-generated E2E tests?

Which AI coding agents are supported?

Does it support test frameworks other than Playwright and Cypress?

Roadmap

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Skill 1: `playwright-test-generator` — Test Generation

Skill 2: `e2e-reviewer` — Quality Review

Skill 3: `playwright-debugger` — Playwright Failure Debugger

Skill 4: `cypress-debugger` — Cypress Failure Debugger

Packages