Skip to content

CI: add smoke suite and gate unit/integration workflows#5

Merged
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests
Mar 4, 2026
Merged

CI: add smoke suite and gate unit/integration workflows#5
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests

Conversation

@shaypal5

@shaypal5 shaypal5 commented Mar 3, 2026

Copy link
Copy Markdown
Member

Summary

  • add a dedicated tests/smoke suite with fast checks for CLI/config/seen-store
  • update smoke-tests workflow to run the smoke suite directly
  • enforce smoke as a strict prerequisite for unit-tests and integration-tests
  • keep pre-commit, lint, coverage, unit, and integration as separate workflows

Validation

  • ran locally: PYTHONPATH=src pytest -q tests/smoke (3 passed)

Notes

  • unit and integration workflows now trigger only from successful smoke-tests runs on the same head SHA/branch

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight smoke test suite and restructures CI so smoke tests run first and (attempt to) gate heavier test workflows behind a successful smoke run.

Changes:

  • Introduces tests/smoke with fast checks covering CLI load/version, config defaults, and seen-store persistence.
  • Adds/updates GitHub Actions workflows for smoke, unit, integration, lint, pre-commit, and coverage.
  • Updates dev dependencies to include pytest-cov and pre-commit.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/smoke/test_smoke_suite.py New fast smoke tests for CLI/config/SeenStore.
pyproject.toml Adds dev tools needed by CI (pytest-cov, pre-commit).
.pre-commit-config.yaml Defines pre-commit hooks (basic hygiene + ruff/format).
.pre-commit-ci.yaml Configures pre-commit.ci service behavior (autoupdate schedule, no autofix PRs).
.github/workflows/smoke-tests.yml Runs the smoke suite on PRs and main pushes.
.github/workflows/unit-tests.yml Runs unit tests on workflow_run after smoke completion.
.github/workflows/integration-tests.yml Runs integration tests on workflow_run after smoke completion.
.github/workflows/lint.yml Adds ruff format/lint + mypy workflow.
.github/workflows/pre-commit-ci.yml Runs pre-commit hooks in GitHub Actions.
.github/workflows/codecoverage.yml Runs unit+integration with coverage and uploads XML artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/unit-tests.yml Outdated
Comment thread .github/workflows/integration-tests.yml Outdated
Comment thread .github/workflows/unit-tests.yml Outdated
Comment thread .github/workflows/integration-tests.yml Outdated
@shaypal5 shaypal5 merged commit aaf08b2 into main Mar 4, 2026
6 checks passed
@shaypal5 shaypal5 deleted the codex/ci-workflows-smoke-lint-tests branch March 4, 2026 12:28
shaypal5 added a commit that referenced this pull request May 21, 2026
Fixes all 16 issues raised in the post-merge review:

Critical:
- [#1] Orchestrator now checks config.enabled / config.mode at the top of
  evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a
  _noop_pass immediately without running any stage or writing telemetry;
  SHADOW mode downgrades drop→pass in _conclude while preserving
  stopped_at_stage for recall analysis; ENFORCE respects drops.
- [#2] Stage objects are only instantiated when config.stages.X.enabled is
  True; disabled stages are stored as None, preventing model-load cost for
  stages like C (embedding) and D (SLM) that aren't in use.
- [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with
  uniform evaluate(candidate, pass_kind, body=None) signature; all four
  stage stubs (A–D) updated to that signature so the orchestrator calls
  them uniformly.
- [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from
  models.py is the single source of truth.

Major:
- [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a
  copy-paste of the four stage letters.
- [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry
  writer converts to .isoformat() at the serialisation boundary; _path_for
  uses .strftime() directly on the datetime.
- [#7] StageScore.__post_init__ validates p_negative and threshold are both
  in [0.0, 1.0], raising ValueError for out-of-range values.
- [#8] Stage A re-run in evaluate_thick documented with an explicit Note in
  the docstring; thin-result passthrough deferred to a later PR.
- [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt,
  PassKind) — all type: ignore[arg-type] comments removed from helpers.

Minor:
- [#10] flush() removed from PrefilterDecisionWriter.
- [#11] _path_for no longer has a try/except — datetime param makes it
  unnecessary.
- [#12] "short" removed from _hash_config docstring.
- [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of
  try/except/else antipattern.
- [#14] PrefilterStatePaths converted from pydantic BaseModel to
  @dataclasses.dataclass(frozen=True) — consistent with StageScore /
  PrefilterDecision.
- [#15] __init__.py now exports CandidateView, StageEvaluator,
  PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict.
- [#16] cli.py summary command no longer hardcodes agents/news/local.yaml;
  prints an actionable error and exits 1 when --config is not supplied.

Tests: 54 → 78 (+24), all passing.
New coverage: StageScore bounds validation (5 tests), StageEvaluator
protocol conformance for all four stages (5 tests), type-alias smoke checks
(4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1),
shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1),
enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated
(1), decided_at-is-datetime (2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 21, 2026
* feat(prefilter): LPF-PR-01 — prefilter package foundation, models, config, telemetry, no-op cascade

Introduces the `denbust.prefilter` package (10 modules, 54 unit tests, 0 ruff/mypy errors):

- `models.py`: `CandidateView` runtime-checkable Protocol, `StageScore` and
  `PrefilterDecision` frozen dataclasses with Literal-typed `StageName`, `PassKind`,
  `Verdict`, and `StoppedAt` fields.
- `config.py`: `PrefilterMode(StrEnum)` (off/shadow/enforce), per-stage configs
  (`StageAConfig`–`StageDConfig`), `PrefilterStagesConfig`, `PrefilterRefreshConfig`,
  and `PrefilterConfig` root with `~`-expansion model_validator.
- `state_paths.py`: `PrefilterStatePaths` pydantic model + `resolve_prefilter_state_paths()`
  anchoring artefacts under `<state_root>/<dataset>/<job>/prefilter/`.
- `telemetry.py`: `PrefilterDecisionWriter` appending decisions to date-sharded
  `<decisions_dir>/YYYY-MM-DD.jsonl` files.
- `cascade.py`: `CascadeOrchestrator` with `evaluate_thin()` / `evaluate_thick()` — always
  returns `verdict="pass"` stub; records every decision via the writer.
- `stage_a.py`–`stage_d.py`: stub `evaluate()` methods returning `None` so the cascade
  always passes through; full implementations land in LPF-PR-03 through LPF-PR-07.
- `cli.py`: `denbust prefilter summary` Typer command stub.
- `__init__.py`: re-exports `CascadeOrchestrator`, `PrefilterConfig`, `PrefilterMode`,
  `PrefilterDecision`, `StageScore`.
- `src/denbust/config.py`: adds `prefilter: PrefilterConfig` field to the root `Config`.
- `src/denbust/cli.py`: registers `prefilter_app` under `denbust prefilter`.
- `README.md`: retitles the cascade section to reflect active implementation.

Cascade ships disabled (`mode: off`); no pipeline insertion in this PR.
54 unit tests covering protocol conformance, config validation, YAML round-trips,
state-path resolution, JSONL telemetry, and cascade no-op behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(plan): mark LPF-PR-01 done, update last-merged-PR reference (#158)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(prefilter): address code-review issues from PR #158 self-review

Fixes all 16 issues raised in the post-merge review:

Critical:
- [#1] Orchestrator now checks config.enabled / config.mode at the top of
  evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a
  _noop_pass immediately without running any stage or writing telemetry;
  SHADOW mode downgrades drop→pass in _conclude while preserving
  stopped_at_stage for recall analysis; ENFORCE respects drops.
- [#2] Stage objects are only instantiated when config.stages.X.enabled is
  True; disabled stages are stored as None, preventing model-load cost for
  stages like C (embedding) and D (SLM) that aren't in use.
- [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with
  uniform evaluate(candidate, pass_kind, body=None) signature; all four
  stage stubs (A–D) updated to that signature so the orchestrator calls
  them uniformly.
- [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from
  models.py is the single source of truth.

Major:
- [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a
  copy-paste of the four stage letters.
- [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry
  writer converts to .isoformat() at the serialisation boundary; _path_for
  uses .strftime() directly on the datetime.
- [#7] StageScore.__post_init__ validates p_negative and threshold are both
  in [0.0, 1.0], raising ValueError for out-of-range values.
- [#8] Stage A re-run in evaluate_thick documented with an explicit Note in
  the docstring; thin-result passthrough deferred to a later PR.
- [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt,
  PassKind) — all type: ignore[arg-type] comments removed from helpers.

Minor:
- [#10] flush() removed from PrefilterDecisionWriter.
- [#11] _path_for no longer has a try/except — datetime param makes it
  unnecessary.
- [#12] "short" removed from _hash_config docstring.
- [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of
  try/except/else antipattern.
- [#14] PrefilterStatePaths converted from pydantic BaseModel to
  @dataclasses.dataclass(frozen=True) — consistent with StageScore /
  PrefilterDecision.
- [#15] __init__.py now exports CandidateView, StageEvaluator,
  PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict.
- [#16] cli.py summary command no longer hardcodes agents/news/local.yaml;
  prints an actionable error and exits 1 when --config is not supplied.

Tests: 54 → 78 (+24), all passing.
New coverage: StageScore bounds validation (5 tests), StageEvaluator
protocol conformance for all four stages (5 tests), type-alias smoke checks
(4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1),
shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1),
enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated
(1), decided_at-is-datetime (2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 11, 2026
…203)

PR 1 of 3 for search prioritization/batching. Adds the accounting + safety
layer that prevents the silent Exa/Brave 402 that exhausted credits mid-run.

- search_budget.py: SearchBudgetLedger (append-only JSONL under discovery
  state), per-engine/per-month spend rollup, affordable_query_count guard.
  Pricing: Brave $0.005/q, Exa $0.007/q, Google CSE $0.005/q.
- Each engine discovery fn (brave/exa/google_cse) now:
  * runs the budget guard on the planned queries — when a monthly cap is set
    and partly spent, truncates to the affordable count (highest-priority kinds
    first via apply_query_budget) instead of overspending;
  * records only the LIVE (non-cached) requests it issued to the ledger, so the
    per-query checkpoint cache (free re-runs) is not billed.
- Engine routing: Brave is cheaper, so the same dollar cap pushes more queries
  through Brave; per-engine caps express the preference.
- Config: discovery.engines.<engine>.monthly_budget_usd (default None = no cap,
  but spend is still recorded).
- CLI: `denbust search-budget` shows month-to-date queries + $ vs cap per engine.
- apply_query_budget made public (reused by the guard).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 11, 2026
) (#204)

PR 2 of 3. When a budget cap truncates a discovery run, the kept queries are now
ordered least-recently-run first (never-run, then oldest mtime) within each
priority tier, so successive capped runs refresh DIFFERENT slices of the query
pool instead of re-issuing the same head every run — maximising fresh coverage
per dollar.

- engine_checkpoint.query_last_run_at(cache_dir, engine, query): the per-query
  checkpoint file mtime doubles as the last-live-run timestamp.
- queries.select_run_queries(queries, max_queries, last_run_at=...): priority
  cap with optional recency tiebreak. apply_query_budget is now a thin
  priority-only wrapper (last_run_at=None).
- build_discovery_queries gains a last_run_at callback (threaded only by the
  live engine runs; backfill/tests pass None → unchanged behaviour).
- _guard_search_budget and each engine fn pass a recency callback bound to the
  engine's checkpoint dir, so both the config #2 cap and the #5 $-budget guard
  rotate.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants