Skip to content

Add CI smoke/integration tests and fake provider for testability#170

Open
michaelerobertsjr wants to merge 6 commits intoTinyAGI:mainfrom
michaelerobertsjr:feature/ci-integration-smoke-tests
Open

Add CI smoke/integration tests and fake provider for testability#170
michaelerobertsjr wants to merge 6 commits intoTinyAGI:mainfrom
michaelerobertsjr:feature/ci-integration-smoke-tests

Conversation

@michaelerobertsjr
Copy link
Contributor

  • add GitHub CI workflow and release smoke-test gate on Node 20
  • add tsx-based integration/smoke test suites with shared test helpers
  • add internal fake provider and queue retry/dead event emission for deterministic testing
  • allow install dir override via TINYCLAW_INSTALL_DIR

Summary

This PR adds deterministic automated coverage for TinyClaw’s core runtime and release path. It introduces offline integration and smoke tests around the real queue processor, API, SSE stream, SQLite DB, and installed CLI, then gates CI and releases on those checks so regressions are caught before shipping.

Changes

Added a new GitHub Actions CI workflow with:

  • Ubuntu integration tests
    • Ubuntu/macOS smoke tests
    • Node 20 across verification jobs
  • Updated the release workflow to require Ubuntu/macOS smoke-test success before building the release bundle
  • Added tsx + node:test based test scripts in package.json
  • Added shared test helpers for isolated temp homes, seeded settings, spawned queue processors, HTTP polling, SSE parsing, and direct SQLite assertions
  • Added integration coverage for:
    • core message flow
    • agent routing
    • queue state transitions
    • dead-letter retry
    • SSE event ordering
    • persistence across restart
      *Added a CLI smoke test covering install script behavior, installed wrapper help output, agent list, and minimal fake-provider runtime flow
  • Added an internal fake provider with deterministic success/failure/delay controls via:
    • TINYCLAW_FAKE_PROVIDER_MODE
    • TINYCLAW_FAKE_PROVIDER_DELAY_MS
  • Changed provider invocation failures to participate in queue retry/dead-letter handling instead of generating fallback responses
  • Added retry/dead event emission for observability:
    • message_retry_scheduled
    • message_dead
  • Updated scripts/install.sh to honor TINYCLAW_INSTALL_DIR
  • Updated tinyclaw.sh to support help, -h, and --help

Type of Change

  • Bug fix
  • New feature
  • Enhancement to existing feature
  • Breaking change
  • Documentation update
  • Refactor / code cleanup

Testing

  • Ran npm run test:integration
  • Ran npm run test:smoke
  • Ran npm run build

Checklist

  • I have tested these changes locally
  • My changes don't introduce new warnings or errors
  • I have updated documentation if needed

@greptile-apps
Copy link

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR adds a comprehensive automated-testing layer to TinyClaw: a GitHub Actions CI workflow, a restructured three-stage release pipeline (build → verify-smoke → publish), tsx-based integration and smoke test suites backed by an internal fake provider, and supporting helper infrastructure (isolated fixtures, SSE client, HTTP helpers, port-retry logic). It also addresses several prior feedback items — the fake-provider production guard, MAX_RETRIES export, null/undefined sentinels in waitFor, and TOCTOU port retries — and makes the queue processor participate properly in retry/dead-letter observability.

Key findings:

  • Tests shipped in the production release tarballcp -r tests/ "$BUNDLE_DIR/" (release.yml line 86) bundles the entire test harness into the artifact attached to GitHub Releases. End users who download the release receive test helpers, fake provider fixtures, and integration tests they don't need. Additionally, because npm prune --omit=dev strips tsx before bundling, a user who extracts the tarball and runs npm run test:smoke directly will hit a missing-binary error. Consider uploading tests/ as a separate CI-only artifact and overlaying it inside verify-smoke rather than including it in the release bundle.
  • verify-smoke job has no timeout-minutes — the new verify-smoke job in release.yml spawns real queue processors and HTTP-polling loops without any timeout cap. A hang will consume the full 6-hour GitHub Actions default and block every release. A timeout-minutes: 20 at the job level mirrors the guard already in the integration job of ci.yml.
  • Missing cache: 'npm' in verify-smoke Node.js setup — every other job passes cache: 'npm' to actions/setup-node; verify-smoke omits it, adding a full npm download on every release run.

Confidence Score: 3/5

  • Safe to merge after addressing the tests-in-release-bundle issue and adding a timeout to verify-smoke; the core runtime and test logic are solid.
  • The test infrastructure, fake provider, queue-processor observability changes, and all prior-thread fixes are well-implemented and correct. The score is held back by two issues in release.yml: test files being bundled into the production tarball (ships test infrastructure to end users and breaks npm run test:smoke from the raw tarball since tsx is pruned), and the new verify-smoke job having no timeout cap, which could hold release runners for up to 6 hours on a hang.
  • .github/workflows/release.yml — the tests/ bundle inclusion and the missing timeout-minutes/cache on verify-smoke should be addressed before merging.

Important Files Changed

Filename Overview
.github/workflows/release.yml Release pipeline refactored into three jobs (build-and-release → verify-smoke → publish-release); tests/ is bundled into the production tarball (ships to end users), and the new verify-smoke job lacks both timeout-minutes and cache: 'npm'.
.github/workflows/ci.yml New CI workflow running integration tests on Ubuntu and smoke tests on Ubuntu/macOS; integration job has timeout-minutes: 15 but the smoke job is missing an equivalent cap.
src/queue-processor.ts Provider invocation failures now emit chain_step_done (with error field) before re-throwing, and the outer catch emits message_retry_scheduled/message_dead SSE events and cleans up dead team-branch conversations via withConversationLock; resolvedAgentId is initialised from dbMsg.agent before routing to ensure correctness if an early error hits the outer catch.
src/lib/invoke.ts Fake provider added with a runtime guard that rejects use outside NODE_ENV=test or TINYCLAW_ALLOW_FAKE_PROVIDER, preventing silent misuse in production.
src/lib/db.ts MAX_RETRIES exported for test assertions; failMessage now returns a FailMessageResult (retryCount + status) and emits message:enqueued for retried messages; retryDeadMessage also emits the same event on success.
tests/helpers/processor.ts Spawns dist/queue-processor.js with NODE_ENV=test and a bounded 5-attempt retry loop for EADDRINUSE port races; stop() sends SIGTERM with a 5-second SIGKILL fallback.
tests/smoke/cli-smoke.test.ts End-to-end smoke test: runs install.sh via TINYCLAW_INSTALL_DIR, asserts --help and agent list output, then exercises a real fake-provider message round-trip through the queue processor.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant CI as ci.yml
    participant Rel as release.yml (build-and-release)
    participant VS as release.yml (verify-smoke)
    participant PR as release.yml (publish-release)
    participant GH as GitHub Releases

    Dev->>CI: push / pull_request
    CI->>CI: integration tests (ubuntu, Node 20)
    CI->>CI: smoke tests (ubuntu + macos, Node 20)

    Dev->>Rel: push vX.Y.Z tag
    Rel->>Rel: verify tag == package.json version
    Rel->>Rel: npm ci + build + prune dev deps
    Rel->>Rel: bundle tarball (src, dist, tests, node_modules…)
    Rel->>VS: upload tinyclaw-bundle artifact

    VS->>VS: download + extract bundle
    VS->>VS: npm ci --include=dev (ubuntu + macos)
    VS->>VS: npm run test:smoke

    VS-->>PR: smoke pass (tag builds only)
    PR->>PR: checkout + download bundle
    PR->>PR: extract release notes from annotated tag
    PR->>GH: create GitHub Release + attach tarball
Loading

Last reviewed commit: a4c19e6

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cc482a4aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@michaelerobertsjr
Copy link
Contributor Author

@greptileai

Mike Roberts and others added 4 commits March 8, 2026 10:57
- add GitHub CI workflow and release smoke-test gate on Node 20
- add tsx-based integration/smoke test suites with shared test helpers
- add internal fake provider and queue retry/dead event emission for deterministic testing
- allow install dir override via `TINYCLAW_INSTALL_DIR`
- Restrict fake provider to test mode unless explicitly allowed via env var
- Emit chain step completion events on invoke errors and finish team conversations when a branch dead-letters
- Add integration coverage for dead team branches, align retry assertion to shared MAX_RETRIES, and clarify install.sh PATH note for custom install dir
Adds an explicit timeout-minutes at the job level (e.g., 15–20 minutes)

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- export `getFreePort` from fixture helper for reuse
- retry queue processor startup up to 5 times when `EADDRINUSE` occurs, reallocating API port
- treat `null` like `undefined` in `waitFor` so polling only resolves on real values
@michaelerobertsjr michaelerobertsjr force-pushed the feature/ci-integration-smoke-tests branch from 5d5e522 to 33ce4e5 Compare March 8, 2026 17:58
Mike Roberts added 2 commits March 8, 2026 11:03
- Remove duplicate `integration` job stub in CI workflow
- Scope release workflow permissions per job (`read` for smoke, `write` for release)
- Fail install if wrapper symlink is missing/non-executable and report PATH precedence issues
- Add dedicated integration job in `ci.yml` and run smoke matrix separately
- Rework `release.yml` to build once, verify smoke from bundle artifacts, then publish on tags
- Preserve resolved routed agent ID in dead-branch events and pending-branch cleanup
- Surface SSE stream read errors in tests and only suppress expected `AbortError` on close
@michaelerobertsjr
Copy link
Contributor Author

@greptileai can you take another look

cp -r lib/ "$BUNDLE_DIR/"
cp -r docs/ "$BUNDLE_DIR/"
cp -r .agents/ "$BUNDLE_DIR/"
cp -r tests/ "$BUNDLE_DIR/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test files included in the production release tarball

tests/ is being copied into the bundle, so every end-user who downloads the official GitHub release tarball also receives the full test harness (integration helpers, fake providers, SSE fixtures, etc.).

More critically, by the time this bundle step runs, npm prune --omit=dev has already stripped tsx from node_modules/. Any user who extracts the release tarball and tries to run npm run test:smoke or npm run test:integration directly will get an error because tsx is absent from the production node_modules.

The cleanest fix is to keep the release tarball test-free and instead upload a dedicated test-only artifact for the verify-smoke job:

# In build-and-release — create a separate test artifact, don't include tests/ in the main bundle
- name: Upload test sources artifact
  uses: actions/upload-artifact@v4
  with:
    name: tinyclaw-tests
    path: |
      tests/
      package.json
      package-lock.json
      tsconfig.json

# Remove cp -r tests/ from the bundle tarball step

Then in verify-smoke, download both tinyclaw-bundle and tinyclaw-tests, overlay the tests directory on the extracted bundle, run npm ci --include=dev, and execute the smoke tests. This keeps the release artifact clean while the CI pipeline still has everything it needs.

Comment on lines +128 to +162
verify-smoke:
needs: build-and-release
permissions:
contents: read
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]

steps:
- name: Download bundle artifact
uses: actions/download-artifact@v4
with:
name: tinyclaw-bundle

- name: Extract bundle
shell: bash
run: |
EXTRACT_DIR="$(mktemp -d)"
tar -xzf tinyclaw-bundle.tar.gz -C "$EXTRACT_DIR"
echo "BUNDLE_DIR=$EXTRACT_DIR/tinyclaw" >> "$GITHUB_ENV"

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'

- name: Install dependencies
working-directory: ${{ env.BUNDLE_DIR }}
run: |
PUPPETEER_SKIP_DOWNLOAD=true npm ci --include=dev

- name: Run smoke tests
working-directory: ${{ env.BUNDLE_DIR }}
run: npm run test:smoke
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify-smoke job missing timeout-minutes

The verify-smoke job spawns real queue processors, runs HTTP polling loops, and waits on SSE streams — the same class of operations that prompted the timeout-minutes: 15 that was added to the integration job in ci.yml. Without an explicit cap, a hung smoke test on Ubuntu or macOS will hold the runner for the full GitHub Actions default of 6 hours before failing, blocking any tag release.

Suggested change
verify-smoke:
needs: build-and-release
permissions:
contents: read
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
steps:
- name: Download bundle artifact
uses: actions/download-artifact@v4
with:
name: tinyclaw-bundle
- name: Extract bundle
shell: bash
run: |
EXTRACT_DIR="$(mktemp -d)"
tar -xzf tinyclaw-bundle.tar.gz -C "$EXTRACT_DIR"
echo "BUNDLE_DIR=$EXTRACT_DIR/tinyclaw" >> "$GITHUB_ENV"
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
working-directory: ${{ env.BUNDLE_DIR }}
run: |
PUPPETEER_SKIP_DOWNLOAD=true npm ci --include=dev
- name: Run smoke tests
working-directory: ${{ env.BUNDLE_DIR }}
run: npm run test:smoke
verify-smoke:
needs: build-and-release
timeout-minutes: 20
permissions:
contents: read
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]

Comment on lines +150 to +153
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cache: 'npm' in verify-smoke Node.js setup

Every other job in both workflows (build-and-release, integration, smoke) passes cache: 'npm' to actions/setup-node. Without it, verify-smoke downloads and installs the full npm dependency tree from the network on every run, adding unnecessary latency to every release verification.

Suggested change
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant