Skip to content
690 changes: 690 additions & 0 deletions docs/superpowers/plans/2026-05-20-test-summary-output.md

Large diffs are not rendered by default.

105 changes: 105 additions & 0 deletions docs/superpowers/specs/2026-05-20-test-summary-output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# twd-cli Test Summary Output β€” Design Spec

**Date:** 2026-05-20
**Status:** Proposed

## Purpose

Make the final output of `twd-cli run` self-describing: at a glance, a developer (or an AI agent piping the output through `grep`) should be able to tell **how many tests passed, how many failed, how many were skipped** β€” without parsing per-test lines or running the suite again.

Today the run ends with a mock-validation summary like:

```
Mocks validated: 128 | Errors: 7 | Warnings: 0 | Skipped: 80
```

That line is about *mocks*, not *tests*. There is no equivalent line for test results. Users reading the tail of the log have to scroll back and visually count `βœ“ should ...` lines, and they may confuse the yellow `βœ— … mock "fetchCart"` contract-warning lines with failing tests (same glyph, similar position).

## Problem (real session)

While running a long suite headless via `npm run test:ci`, the consuming agent re-ran the suite ~5 times trying to confirm "did all tests pass?" because:

1. No final `Tests: N passed, M failed, K skipped` line exists.
2. The yellow `βœ—` glyph used for *mock contract validation failures* looks identical to a failed test marker.
3. ANSI color codes broke naive `grep "βœ“ should"` patterns, so attempts to count from the log returned 0.

Each re-run was ~1:23, so the cost of "I can't tell if it passed" was ~7 minutes of wall time.

## Scope

**In scope:**
- A final, single-line test summary printed after all tests complete.
- Visual disambiguation between *test result* lines and *mock contract validation* lines.
- A machine-friendly summary line (stable format, easy to grep without ANSI gymnastics).

**Out of scope:**
- Changing the per-test output format itself.
- Reworking the mock-validation summary line (the line that exists today is fine β€” it just needs to not be the *only* summary).
- A `--summary` / quiet reporter mode β€” deferred to a follow-up.
- JUnit XML / JSON reporter output β€” deferred to a follow-up.

## Proposed Solution

### 1. Add a final test summary line

After all tests finish (and after the mock-validation summary), print:

```
Tests: 74 passed, 0 failed, 0 skipped (74 total) in 1:23.193
```

Format requirements:
- One line.
- Stable label `Tests:` at the start so it's grep-friendly.
- Colors only on the count digits (green for passed, red for failed if > 0, yellow for skipped if > 0). The label `Tests:` and the words `passed` / `failed` / `skipped` stay uncolored so `grep "^Tests:"` works regardless of ANSI handling.
- Duration in the same `m:ss.SSS` format the runner shows today.

**Duration source.** Today `src/index.js` uses `console.time('Total Test Time')` / `console.timeEnd(...)` to print `Total Test Time: 1:23.193` as its own line. That call's output is not capturable as a value. Replace it with a manual `Date.now()` delta captured around the same span (start before `page.goto`, end after `runner.runAll()` returns), formatted to the same `m:ss.SSS` string. The standalone `Total Test Time:` line is removed; the duration appears only on the `Tests:` line. This keeps the log to one canonical timing line.

When there are failures, also print a `Failed tests:` block with just the test names (no stack traces β€” those already appear inline above), so the developer can see the names at the end of the log without scrolling.

### 2. Disambiguate mock-validation lines from test result lines

The current mock contract output (`src/contractReport.js`) uses `βœ“` for passing mocks, `βœ—` for failing ones, and `⚠` for warnings. The `βœ—` glyph collides visually with the `βœ—` used for failed tests in the suite tree printed by `reportResults` (`twd-js/runner-ci`). Color helps in warn-mode contract failures (yellow) but not in error-mode (red β€” same as test failures), and color is fragile under `grep`/CI log viewers.

**Decision:** add a `MOCK ` prefix to every line that comes out of `contractReport.js`. The existing glyph assignments stay (`βœ“` pass, `βœ—` fail, `⚠` warning) β€” they are correct *within* the contract report; the prefix is what distinguishes contract lines from test-result lines.

Example before:
```
βœ— GET /v1/carts/{cart_id} (200) β€” mock "fetchCart" β€” in "Checkout New β€” Redis ID Flow > ..."
```

Example after:
```
MOCK βœ— GET /v1/carts/{cart_id} (200) β€” mock "fetchCart" β€” in "Checkout New β€” Redis ID Flow > ..."
```

Apply the prefix uniformly to all four line kinds the report can emit: pass (`βœ“`), fail (`βœ—`), warning (`⚠`), and skipped (`β„Ή`). Indentation already exists; the prefix sits between the indentation and the glyph.

## Exit Code Behavior

No change. Exit code already reflects test failures plus `mode: "error"` contract failures (`src/index.js:101,119`).

**Interplay with the `Tests:` line.** The new `Tests:` summary counts test outcomes *only* (pass/fail/skip from `testStatus`). A run can legitimately exit non-zero while `Tests:` reads `0 failed` β€” that means every test passed but at least one mock failed contract validation in `error` mode. The mock summary line (`Mocks validated: … | Errors: N | …`) and the contract report block above it are the canonical place to see contract failures; the `Tests:` line is not retroactively edited to fold them in.

## Testing Strategy

- Unit test the summary formatter directly: given a `testStatus` array with a known mix (e.g. 3 pass, 1 fail, 1 skip) and a duration value, assert the `Tests:` line matches the expected format. Keep this layer pure (no Puppeteer) so the format is easy to lock down.
- Unit test the failed-tests block: given a `testStatus` array with two failures and a `handlers` array, assert both names appear under `Failed tests:` in the order the suite produced them.
- Extend the existing `contractReport.test.js` to assert every emitted line starts with `MOCK ` (after any leading whitespace). Cover all four line kinds: pass, fail, warning, skipped.
- Verify `grep "^Tests:"` against a raw run (ANSI included) returns exactly one line β€” i.e. the label is not wrapped in escape sequences. (The count digits themselves may carry color codes; the label must not.)

## Benefits

- **Faster developer feedback:** one line at the end answers "did it pass?" β€” no scrolling, no counting.
- **AI-agent friendly:** stable, grep-able summary line. Avoids re-running long suites just to confirm a result.
- **Less confusion between mocks and tests:** the `MOCK ` prefix removes the "is that a test failure or a mock warning?" question.

## Notes / Open Questions

- Should the failed-test block at the end include the file path + line number for each failure, or just the test name? (Stack traces already appear inline above.) Default for the implementation plan: **just the test name**, mirroring what the per-test line shows. Revisit if it proves too thin.

## Follow-up Work (Out of Scope Here)

- **`--summary` / quiet reporter.** A mode that suppresses per-request mock log lines (which dominate output for large suites) and prints only RUN/PASS/FAIL per test, the `Tests:` line, the mock-validation summary line, and the contract report path. Likely shaped as a `twd.config.json` field (`reporter: "summary"`) for consistency with how other twd-cli behavior is configured, not a CLI flag.
- **`--json` reporter** for CI dashboards. The summary-line work in this spec makes this trivial later.
8 changes: 4 additions & 4 deletions src/contractReport.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ export function printContractReport(output) {

if (!result.validation.valid) {
errorCount += result.validation.errors.length;
console.log(failColor(` βœ— ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
console.log(failColor(` MOCK βœ— ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
for (const err of result.validation.errors) {
console.log(detailColor(` β†’ ${err.path}: ${err.message}`));
}
Expand All @@ -56,12 +56,12 @@ export function printContractReport(output) {
hasContractErrors = true;
}
} else if (result.validation.warnings.length === 0) {
console.log(green(` βœ“ ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
console.log(green(` MOCK βœ“ ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
}

for (const warning of result.validation.warnings) {
warningCount++;
console.log(yellow(` ⚠ ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
console.log(yellow(` MOCK ⚠ ${result.method} ${result.matchedPath} (${result.status}) β€” ${formatMockLabel(result)}`));
console.log(yellow(` ${warning.message}`));
console.log('');
}
Expand All @@ -71,7 +71,7 @@ export function printContractReport(output) {
if (skipped.length > 0) {
console.log(dim('Skipped:'));
for (const skip of skipped) {
console.log(dim(` β„Ή "${skip.alias}" β€” ${skip.url}`));
console.log(dim(` MOCK β„Ή "${skip.alias}" β€” ${skip.url}`));
console.log(dim(` ${skip.reason === 'urlRegex mock' ? 'Regex URL pattern' : 'No matching path in any spec'}`));
}
console.log('');
Expand Down
7 changes: 7 additions & 0 deletions src/formatDuration.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
export function formatDuration(ms) {
const totalSeconds = Math.floor(ms / 1000);
const minutes = Math.floor(totalSeconds / 60);
const seconds = totalSeconds % 60;
const millis = ms % 1000;
return `${minutes}:${String(seconds).padStart(2, '0')}.${String(millis).padStart(3, '0')}`;
}
15 changes: 13 additions & 2 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import { loadContracts, validateMocks } from './contracts.js';
import { printContractReport } from './contractReport.js';
import { generateContractMarkdown } from './contractMarkdown.js';
import { buildTestPath } from './buildTestPath.js';
import { formatTestSummary, formatFailedTestsBlock } from './testSummary.js';

export async function runTests() {
let browser;
Expand All @@ -29,7 +30,6 @@ export async function runTests() {
});

const page = await browser.newPage();
console.time('Total Test Time');

// Register mock collector for contract validation
const collectedMocks = new Map();
Expand All @@ -46,6 +46,7 @@ export async function runTests() {
}

// Navigate to your development server
const startedAt = Date.now();
console.log(`Navigating to ${config.url} ...`);
await page.goto(config.url);

Expand Down Expand Up @@ -80,6 +81,8 @@ export async function runTests() {
return { handlers: Array.from(handlers.values()), testStatus };
}, config.retryCount);

const durationMs = Date.now() - startedAt;

console.log(`Tests to report: ${testStatus.length}`);

// Display results in console
Expand All @@ -99,7 +102,6 @@ export async function runTests() {

// Exit with appropriate code
let hasFailures = testStatus.some(test => test.status === 'fail');
console.timeEnd('Total Test Time');

// Enrich collected mocks with full test path names
for (const [, mock] of collectedMocks) {
Expand Down Expand Up @@ -158,6 +160,15 @@ export async function runTests() {
await browser.close();
console.log('Browser closed.');

console.log('');
console.log(formatTestSummary({ testStatus, durationMs }));
const failedBlock = formatFailedTestsBlock({ testStatus, handlers });
if (failedBlock) {
for (const line of failedBlock.split('\n')) {
console.log(line);
}
}

return hasFailures;

} catch (error) {
Expand Down
32 changes: 32 additions & 0 deletions src/testSummary.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import { formatDuration } from './formatDuration.js';

const green = (s) => `\x1b[32m${s}\x1b[0m`;
const red = (s) => `\x1b[31m${s}\x1b[0m`;
const yellow = (s) => `\x1b[33m${s}\x1b[0m`;

export function formatTestSummary({ testStatus, durationMs }) {
const passed = testStatus.filter((t) => t.status === 'pass').length;
const failed = testStatus.filter((t) => t.status === 'fail').length;
const skipped = testStatus.filter((t) => t.status === 'skip').length;
const total = testStatus.length;

const passedStr = `${green(passed)} passed`;
const failedStr = `${failed > 0 ? red(failed) : '0'} failed`;
const skippedStr = `${skipped > 0 ? yellow(skipped) : '0'} skipped`;

return `Tests: ${passedStr}, ${failedStr}, ${skippedStr} (${total} total) in ${formatDuration(durationMs)}`;
}

export function formatFailedTestsBlock({ testStatus, handlers }) {
const failures = testStatus.filter((t) => t.status === 'fail');
if (failures.length === 0) return null;

const handlersById = new Map(handlers.map((h) => [h.id, h]));
const lines = ['Failed tests:'];
for (const failure of failures) {
const handler = handlersById.get(failure.id);
const name = handler ? handler.name : failure.id;
lines.push(` ${red('βœ—')} ${name}`);
}
return lines.join('\n');
}
60 changes: 60 additions & 0 deletions tests/contractReport.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,66 @@ describe('printContractReport', () => {
expect(logs).toContain('mock "getPets" β€” in "Cart > should load items"');
});

it('prefixes every glyph-led line with MOCK ', () => {
const output = {
results: [
// pass
{
alias: 'getPets',
url: '/api/v1/pets',
method: 'GET',
status: 200,
specSource: './openapi.json',
matchedPath: '/v1/pets',
mode: 'warn',
validation: { valid: true, errors: [], warnings: [] },
},
// fail
{
alias: 'createPet',
url: '/api/v1/pets',
method: 'POST',
status: 201,
specSource: './openapi.json',
matchedPath: '/v1/pets',
mode: 'warn',
validation: {
valid: false,
errors: [{ path: 'response.id', message: 'expected integer, got string', keyword: 'type' }],
warnings: [],
},
},
// warning
{
alias: 'serverError',
url: '/api/v1/pets',
method: 'GET',
status: 500,
specSource: './openapi.json',
matchedPath: '/v1/pets',
mode: 'warn',
validation: {
valid: true,
errors: [],
warnings: [{ type: 'UNMATCHED_STATUS', message: 'Status 500 not documented' }],
},
},
],
skipped: [
{ alias: 'untracked', url: '/whatever', reason: 'No matching path in any spec' },
],
};

printContractReport(output);

const lines = consoleSpy.mock.calls.map((c) => stripAnsi(c[0]));
const glyphLines = lines.filter((l) => /^\s*(MOCK\s+)?[βœ“βœ—βš β„Ή]/.test(l));
expect(glyphLines.length).toBeGreaterThanOrEqual(4);
for (const line of glyphLines) {
expect(line).toMatch(/^\s*MOCK [βœ“βœ—βš β„Ή]/);
}
});

it('prints occurrence suffix when occurrence > 1', () => {
const output = {
results: [
Expand Down
28 changes: 28 additions & 0 deletions tests/formatDuration.test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { describe, it, expect } from 'vitest';
import { formatDuration } from '../src/formatDuration.js';

describe('formatDuration', () => {
it('formats zero as 0:00.000', () => {
expect(formatDuration(0)).toBe('0:00.000');
});

it('formats sub-second durations with leading zero minutes/seconds', () => {
expect(formatDuration(123)).toBe('0:00.123');
});

it('formats single-digit seconds with a leading zero', () => {
expect(formatDuration(5_678)).toBe('0:05.678');
});

it('formats the spec example (83.193s) as 1:23.193', () => {
expect(formatDuration(83_193)).toBe('1:23.193');
});

it('formats a long duration past 10 minutes', () => {
expect(formatDuration(754_567)).toBe('12:34.567');
});

it('pads milliseconds to three digits', () => {
expect(formatDuration(60_007)).toBe('1:00.007');
});
});
31 changes: 29 additions & 2 deletions tests/runTests.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@ describe("runTests", () => {
vi.clearAllMocks();
vi.mocked(loadConfig).mockReturnValue({ ...defaultMockConfig });
consoleSpy = vi.spyOn(console, 'log').mockImplementation(() => {});
vi.spyOn(console, 'time').mockImplementation(() => {});
vi.spyOn(console, 'timeEnd').mockImplementation(() => {});
});

afterEach(() => {
Expand Down Expand Up @@ -244,4 +242,33 @@ describe("runTests", () => {
expect(entries[0].alias).toBe('getPhoto');
expect(entries[0].occurrence).toBe(1);
});

it("should print the Tests: summary line and Failed tests block", async () => {
const testStatus = [
{ id: '1', status: 'pass' },
{ id: '2', status: 'fail', error: 'boom' },
{ id: '3', status: 'skip' },
];
const handlers = [
{ id: '1', name: 'should render', type: 'test' },
{ id: '2', name: 'should submit form', type: 'test' },
{ id: '3', name: 'should show error', type: 'test' },
];
const page = createMockPage({ handlers, testStatus });
const browser = createMockBrowser(page);
vi.mocked(puppeteer.launch).mockResolvedValue(browser);

await runTests();

const stripAnsi = (s) => s.replace(/\x1b\[[0-9;]*m/g, '');
const logs = consoleSpy.mock.calls.map((c) => stripAnsi(String(c[0])));

const summaryLine = logs.find((l) => l.startsWith('Tests:'));
expect(summaryLine).toBeDefined();
expect(summaryLine).toMatch(/^Tests: 1 passed, 1 failed, 1 skipped \(3 total\) in \d+:\d{2}\.\d{3}$/);

const failedHeader = logs.find((l) => l === 'Failed tests:');
expect(failedHeader).toBeDefined();
expect(logs.some((l) => l.includes('should submit form'))).toBe(true);
});
});
Loading
Loading