CI: add smoke suite and gate unit/integration workflows by shaypal5 · Pull Request #5 · DataHackIL/tfht_enforce_idx

shaypal5 · 2026-03-03T14:29:59Z

Summary

add a dedicated tests/smoke suite with fast checks for CLI/config/seen-store
update smoke-tests workflow to run the smoke suite directly
enforce smoke as a strict prerequisite for unit-tests and integration-tests
keep pre-commit, lint, coverage, unit, and integration as separate workflows

Validation

ran locally: PYTHONPATH=src pytest -q tests/smoke (3 passed)

Notes

unit and integration workflows now trigger only from successful smoke-tests runs on the same head SHA/branch

Copilot

Pull request overview

Adds a lightweight smoke test suite and restructures CI so smoke tests run first and (attempt to) gate heavier test workflows behind a successful smoke run.

Changes:

Introduces tests/smoke with fast checks covering CLI load/version, config defaults, and seen-store persistence.
Adds/updates GitHub Actions workflows for smoke, unit, integration, lint, pre-commit, and coverage.
Updates dev dependencies to include pytest-cov and pre-commit.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/smoke/test_smoke_suite.py`	New fast smoke tests for CLI/config/SeenStore.
`pyproject.toml`	Adds dev tools needed by CI (pytest-cov, pre-commit).
`.pre-commit-config.yaml`	Defines pre-commit hooks (basic hygiene + ruff/format).
`.pre-commit-ci.yaml`	Configures pre-commit.ci service behavior (autoupdate schedule, no autofix PRs).
`.github/workflows/smoke-tests.yml`	Runs the smoke suite on PRs and main pushes.
`.github/workflows/unit-tests.yml`	Runs unit tests on `workflow_run` after smoke completion.
`.github/workflows/integration-tests.yml`	Runs integration tests on `workflow_run` after smoke completion.
`.github/workflows/lint.yml`	Adds ruff format/lint + mypy workflow.
`.github/workflows/pre-commit-ci.yml`	Runs pre-commit hooks in GitHub Actions.
`.github/workflows/codecoverage.yml`	Runs unit+integration with coverage and uploads XML artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Fixes all 16 issues raised in the post-merge review: Critical: - [#1] Orchestrator now checks config.enabled / config.mode at the top of evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a _noop_pass immediately without running any stage or writing telemetry; SHADOW mode downgrades drop→pass in _conclude while preserving stopped_at_stage for recall analysis; ENFORCE respects drops. - [#2] Stage objects are only instantiated when config.stages.X.enabled is True; disabled stages are stored as None, preventing model-load cost for stages like C (embedding) and D (SLM) that aren't in use. - [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with uniform evaluate(candidate, pass_kind, body=None) signature; all four stage stubs (A–D) updated to that signature so the orchestrator calls them uniformly. - [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from models.py is the single source of truth. Major: - [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a copy-paste of the four stage letters. - [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry writer converts to .isoformat() at the serialisation boundary; _path_for uses .strftime() directly on the datetime. - [#7] StageScore.__post_init__ validates p_negative and threshold are both in [0.0, 1.0], raising ValueError for out-of-range values. - [#8] Stage A re-run in evaluate_thick documented with an explicit Note in the docstring; thin-result passthrough deferred to a later PR. - [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt, PassKind) — all type: ignore[arg-type] comments removed from helpers. Minor: - [#10] flush() removed from PrefilterDecisionWriter. - [#11] _path_for no longer has a try/except — datetime param makes it unnecessary. - [#12] "short" removed from _hash_config docstring. - [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of try/except/else antipattern. - [#14] PrefilterStatePaths converted from pydantic BaseModel to @dataclasses.dataclass(frozen=True) — consistent with StageScore / PrefilterDecision. - [#15] __init__.py now exports CandidateView, StageEvaluator, PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict. - [#16] cli.py summary command no longer hardcodes agents/news/local.yaml; prints an actionable error and exits 1 when --config is not supplied. Tests: 54 → 78 (+24), all passing. New coverage: StageScore bounds validation (5 tests), StageEvaluator protocol conformance for all four stages (5 tests), type-alias smoke checks (4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1), shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1), enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated (1), decided_at-is-datetime (2). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(prefilter): LPF-PR-01 — prefilter package foundation, models, config, telemetry, no-op cascade Introduces the `denbust.prefilter` package (10 modules, 54 unit tests, 0 ruff/mypy errors): - `models.py`: `CandidateView` runtime-checkable Protocol, `StageScore` and `PrefilterDecision` frozen dataclasses with Literal-typed `StageName`, `PassKind`, `Verdict`, and `StoppedAt` fields. - `config.py`: `PrefilterMode(StrEnum)` (off/shadow/enforce), per-stage configs (`StageAConfig`–`StageDConfig`), `PrefilterStagesConfig`, `PrefilterRefreshConfig`, and `PrefilterConfig` root with `~`-expansion model_validator. - `state_paths.py`: `PrefilterStatePaths` pydantic model + `resolve_prefilter_state_paths()` anchoring artefacts under `<state_root>/<dataset>/<job>/prefilter/`. - `telemetry.py`: `PrefilterDecisionWriter` appending decisions to date-sharded `<decisions_dir>/YYYY-MM-DD.jsonl` files. - `cascade.py`: `CascadeOrchestrator` with `evaluate_thin()` / `evaluate_thick()` — always returns `verdict="pass"` stub; records every decision via the writer. - `stage_a.py`–`stage_d.py`: stub `evaluate()` methods returning `None` so the cascade always passes through; full implementations land in LPF-PR-03 through LPF-PR-07. - `cli.py`: `denbust prefilter summary` Typer command stub. - `__init__.py`: re-exports `CascadeOrchestrator`, `PrefilterConfig`, `PrefilterMode`, `PrefilterDecision`, `StageScore`. - `src/denbust/config.py`: adds `prefilter: PrefilterConfig` field to the root `Config`. - `src/denbust/cli.py`: registers `prefilter_app` under `denbust prefilter`. - `README.md`: retitles the cascade section to reflect active implementation. Cascade ships disabled (`mode: off`); no pipeline insertion in this PR. 54 unit tests covering protocol conformance, config validation, YAML round-trips, state-path resolution, JSONL telemetry, and cascade no-op behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(plan): mark LPF-PR-01 done, update last-merged-PR reference (#158) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(prefilter): address code-review issues from PR #158 self-review Fixes all 16 issues raised in the post-merge review: Critical: - [#1] Orchestrator now checks config.enabled / config.mode at the top of evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a _noop_pass immediately without running any stage or writing telemetry; SHADOW mode downgrades drop→pass in _conclude while preserving stopped_at_stage for recall analysis; ENFORCE respects drops. - [#2] Stage objects are only instantiated when config.stages.X.enabled is True; disabled stages are stored as None, preventing model-load cost for stages like C (embedding) and D (SLM) that aren't in use. - [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with uniform evaluate(candidate, pass_kind, body=None) signature; all four stage stubs (A–D) updated to that signature so the orchestrator calls them uniformly. - [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from models.py is the single source of truth. Major: - [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a copy-paste of the four stage letters. - [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry writer converts to .isoformat() at the serialisation boundary; _path_for uses .strftime() directly on the datetime. - [#7] StageScore.__post_init__ validates p_negative and threshold are both in [0.0, 1.0], raising ValueError for out-of-range values. - [#8] Stage A re-run in evaluate_thick documented with an explicit Note in the docstring; thin-result passthrough deferred to a later PR. - [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt, PassKind) — all type: ignore[arg-type] comments removed from helpers. Minor: - [#10] flush() removed from PrefilterDecisionWriter. - [#11] _path_for no longer has a try/except — datetime param makes it unnecessary. - [#12] "short" removed from _hash_config docstring. - [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of try/except/else antipattern. - [#14] PrefilterStatePaths converted from pydantic BaseModel to @dataclasses.dataclass(frozen=True) — consistent with StageScore / PrefilterDecision. - [#15] __init__.py now exports CandidateView, StageEvaluator, PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict. - [#16] cli.py summary command no longer hardcodes agents/news/local.yaml; prints an actionable error and exits 1 when --config is not supplied. Tests: 54 → 78 (+24), all passing. New coverage: StageScore bounds validation (5 tests), StageEvaluator protocol conformance for all four stages (5 tests), type-alias smoke checks (4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1), shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1), enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated (1), decided_at-is-datetime (2). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…203) PR 1 of 3 for search prioritization/batching. Adds the accounting + safety layer that prevents the silent Exa/Brave 402 that exhausted credits mid-run. - search_budget.py: SearchBudgetLedger (append-only JSONL under discovery state), per-engine/per-month spend rollup, affordable_query_count guard. Pricing: Brave $0.005/q, Exa $0.007/q, Google CSE $0.005/q. - Each engine discovery fn (brave/exa/google_cse) now: * runs the budget guard on the planned queries — when a monthly cap is set and partly spent, truncates to the affordable count (highest-priority kinds first via apply_query_budget) instead of overspending; * records only the LIVE (non-cached) requests it issued to the ledger, so the per-query checkpoint cache (free re-runs) is not billed. - Engine routing: Brave is cheaper, so the same dollar cap pushes more queries through Brave; per-engine caps express the preference. - Config: discovery.engines.<engine>.monthly_budget_usd (default None = no cap, but spend is still recorded). - CLI: `denbust search-budget` shows month-to-date queries + $ vs cap per engine. - apply_query_budget made public (reused by the guard). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

) (#204) PR 2 of 3. When a budget cap truncates a discovery run, the kept queries are now ordered least-recently-run first (never-run, then oldest mtime) within each priority tier, so successive capped runs refresh DIFFERENT slices of the query pool instead of re-issuing the same head every run — maximising fresh coverage per dollar. - engine_checkpoint.query_last_run_at(cache_dir, engine, query): the per-query checkpoint file mtime doubles as the last-live-run timestamp. - queries.select_run_queries(queries, max_queries, last_run_at=...): priority cap with optional recency tiebreak. apply_query_budget is now a thin priority-only wrapper (last_run_at=None). - build_discovery_queries gains a last_run_at callback (threaded only by the live engine runs; backfill/tests pass None → unchanged behaviour). - _guard_search_budget and each engine fn pass a recency callback bound to the engine's checkpoint dir, so both the config #2 cap and the #5 $-budget guard rotate. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

shaypal5 added 2 commits March 3, 2026 16:11

Add separate CI workflows with smoke-gated tests

30c0c37

ci: add smoke suite and gate unit/integration workflows

df9e081

shaypal5 requested a review from Copilot March 3, 2026 14:54

Copilot started reviewing on behalf of shaypal5 March 3, 2026 14:55 View session

ci: fix lint import order and ignore fixture html whitespace

74efd0b

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Comment thread .github/workflows/unit-tests.yml Outdated

Comment thread .github/workflows/integration-tests.yml Outdated

Comment thread .github/workflows/unit-tests.yml Outdated

Comment thread .github/workflows/integration-tests.yml Outdated

shaypal5 added 3 commits March 3, 2026 17:01

ci: skip eof fixer for fixture html files

eefbd5d

chore: remove trailing whitespace for pre-commit

9285503

test: make integration fixture date window stable

a74ad8b

shaypal5 added configuration enhancement New feature or request ci tests labels Mar 4, 2026

shaypal5 added 2 commits March 4, 2026 12:25

ci: harden workflow_run test jobs for concurrency and fork safety

457d128

Consolidate CI into single ci-test workflow

eacb5ce

shaypal5 merged commit aaf08b2 into main Mar 4, 2026
6 checks passed

shaypal5 deleted the codex/ci-workflows-smoke-lint-tests branch March 4, 2026 12:28

shaypal5 mentioned this pull request Jun 11, 2026

feat(discovery): yield-weighted query prioritization (#3) #205

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: add smoke suite and gate unit/integration workflows#5

CI: add smoke suite and gate unit/integration workflows#5
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests

shaypal5 commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Mar 3, 2026

Summary

Validation

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants