fix(discovery): add missing discovery_runs.skipped_query_count column#223
Merged
Conversation
DiscoveryRun.skipped_query_count (the rolling-window search backstop's deliberately-skipped-query counter) is written by DiscoveryStore.write_run() via a full model_dump() POST to discovery_runs, but no migration ever added the column. Every run-metadata write therefore failed with "column skipped_query_count does not exist", blocking candidate persistence on the daily-review runs (issues #150/#152/#156/#184/#186). Add supabase/migrations/20260616_discovery_runs_skipped_query_count.sql, a regression test for the column, and a guard test asserting every DiscoveryRun field maps to a discovery_runs column so this class of drift is caught in CI. This also clears the schema break that would otherwise fail the UNIFY-PR-06 go-live dispatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds the missing skipped_query_count column to the Supabase public.discovery_runs table so DiscoveryStore.write_run() can persist the full DiscoveryRun.model_dump() payload without PostgREST schema errors, and adds unit tests to prevent future model↔schema drift.
Changes:
- Add an idempotent Supabase migration to create
discovery_runs.skipped_query_countwithNOT NULL DEFAULT 0. - Add a regression test for the new column and a schema-guard test intended to ensure every
DiscoveryRunfield maps to adiscovery_runscolumn. - Update
.agent-plan.mdto mark the schema fix as completed.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/unit/test_discovery_migration.py | Adds regression/guard tests for discovery-run schema compatibility. |
| supabase/migrations/20260616_discovery_runs_skipped_query_count.sql | Adds the missing skipped_query_count column to public.discovery_runs. |
| .agent-plan.md | Documents completion of the discovery-runs schema fix work item. |
Comment on lines
+35
to
+43
| def test_discovery_runs_has_a_column_for_every_model_field() -> None: | ||
| """Every DiscoveryRun scalar field must map to a discovery_runs column across | ||
| the migrations, so write_run()'s model_dump() never posts an unknown column.""" | ||
| migrations = "\n".join( | ||
| p.read_text(encoding="utf-8") for p in sorted(Path("supabase/migrations").glob("*.sql")) | ||
| ) | ||
| # `errors` is the jsonb column; the rest are scalar columns on discovery_runs. | ||
| for field in DiscoveryRun.model_fields: | ||
| assert field in migrations, f"discovery_runs is missing a column for {field!r}" |
This comment has been minimized.
This comment has been minimized.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #223 +/- ##
=======================================
Coverage 92.84% 92.84%
=======================================
Files 84 84
Lines 12337 12337
=======================================
Hits 11454 11454
Misses 883 883 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
Self-review found the regression test was unsound: it substring-matched each DiscoveryRun field against ALL migrations concatenated, but discovery models share field names (status, errors, dataset_name, ...) with columns on other tables — so a column missing from discovery_runs could be masked by a same-named column elsewhere, and the guard would pass while production threw PGRST204. - Replace it with a per-table column extractor (scoped to each table's CREATE TABLE block + ALTER ... ADD COLUMN) and assert model_fields is a subset of the table's columns for ALL five persisted discovery models (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance), with an explicit (empty) per-model exception set. Drift in any of them now fails CI. Verified the other four tables are already in sync. - Add a deploy note to the migration: PGRST204 is a schema-cache miss, so after `supabase db push` PostgREST must reload its schema cache. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Comment on lines
+11
to
+12
| Supabase run-metadata write and blocking candidate persistence on daily-review runs; a guard test | ||
| now asserts every `DiscoveryRun` field maps to a column. Clears the schema break on the |
| every run-metadata write failed with “column skipped_query_count does not exist”, blocking | ||
| candidate persistence on the daily-review runs. New migration | ||
| `supabase/migrations/20260616_discovery_runs_skipped_query_count.sql` plus a regression test and a | ||
| guard test asserting every `DiscoveryRun` field maps to a `discovery_runs` column. Clears the |
|
pr-agent-context report: This run includes unresolved review comments on PR #223.
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: tests/unit/test_discovery_migration.py
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3418832097
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
The guard test can produce false positives because it asserts each `DiscoveryRun` field name appears anywhere across *all* migrations. Several `DiscoveryRun` fields (e.g. `candidate_count`) also appear in other tables’ migrations, so the test could pass even if `public.discovery_runs` is missing a column. Restrict the search to SQL statements that define/alter `public.discovery_runs` and then match whole-word column names there.
## COPILOT-2
Location: .agent-plan.md:12
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608479
Root author: copilot-pull-request-reviewer
Comment:
This entry says the new guard test asserts every `DiscoveryRun` field maps to a column, but the added unit test actually checks schema parity for *all persisted discovery models* (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Update the wording so the plan remains accurate mainline truth.
## COPILOT-3
Location: .agent-plan.md:361
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608518
Root author: copilot-pull-request-reviewer
Comment:
This line narrows the guard test to `DiscoveryRun`/`discovery_runs`, but the implemented test guards *all* persisted discovery models against model<->table drift. Adjust wording so the ledger entry matches what the test enforces.Run metadata: |
This was referenced Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The daily-review bot has filed the same failure five times — #150, #152, #156, #184, #186 — all verbatim PGRST204: "Could not find the 'skipped_query_count' column of 'discovery_runs' in the schema cache." This is the root-cause fix.
DiscoveryRungained askipped_query_countfield (the rolling-window search backstop fromUNIFY-PR-05counts queries it deliberately skips — discovery/models.py:269) and the pipeline writes it (pipeline.py:1417). The Supabase-backedDiscoveryStore.write_run()posts the fullmodel_dump()to thediscovery_runstable via PostgREST (discovery/storage.py:471), but no migration ever added the column — thediscovery_runstable (20260410…foundation.sql) hasquery_count/candidate_count/… but notskipped_query_count. So every run-metadata upsert 400s and aborts source-candidate persistence.It's also on the
UNIFY-PR-06go-live critical path: that PR re-enables the scheduleddiscover/daily-reviewworkflows, which persist to Supabase — so the go-live dispatch would reproduce this exact failure.What
supabase/migrations/20260616_discovery_runs_skipped_query_count.sql—add column if not exists skipped_query_count integer not null default 0onpublic.discovery_runs(idempotent; mirrors the existingquery_countdefinition), with a deploy note that PGRST204 is a schema-cache miss (reload afterdb push).test_every_persisted_model_field_maps_to_a_table_column). The first cut of this test was unsound — it substring-matched field names against all migrations concatenated, and discovery models share field names (status,errors,dataset_name, …) with columns on other tables, so a column missing fromdiscovery_runscould be masked by a same-named column elsewhere. The replacement parses columns per table (CREATE TABLE block +ALTER … ADD COLUMN) and assertsmodel_fields ⊆ columnsfor all five persisted discovery models (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Drift in any of them now fails CI. The other four tables were audited and are in sync.skipped_query_countcolumn itself.Verification
ruff format --check ./ruff check ./mypy src/— cleanpytest -q tests/unit -k "discovery or migration or storage"— greendiscovery_runs, the new test reports it missing (the old substring test would have passed).Deploy (must happen for the runtime error to stop)
Code alone can't add the column. After merge:
supabase db push(projectltafdcsnvppvqyoclkod).notify pgrst, 'reload schema';Refs #150, #152, #156, #184, #186.