fix(discovery): add missing discovery_runs.skipped_query_count column by shaypal5 · Pull Request #223 · DataHackIL/tfht_enforce_idx

shaypal5 · 2026-06-16T07:20:23Z

Why

The daily-review bot has filed the same failure five times — #150, #152, #156, #184, #186 — all verbatim PGRST204: "Could not find the 'skipped_query_count' column of 'discovery_runs' in the schema cache." This is the root-cause fix.

DiscoveryRun gained a skipped_query_count field (the rolling-window search backstop from UNIFY-PR-05 counts queries it deliberately skips — discovery/models.py:269) and the pipeline writes it (pipeline.py:1417). The Supabase-backed DiscoveryStore.write_run() posts the full model_dump() to the discovery_runs table via PostgREST (discovery/storage.py:471), but no migration ever added the column — the discovery_runs table (20260410…foundation.sql) has query_count/candidate_count/… but not skipped_query_count. So every run-metadata upsert 400s and aborts source-candidate persistence.

It's also on the UNIFY-PR-06 go-live critical path: that PR re-enables the scheduled discover/daily-review workflows, which persist to Supabase — so the go-live dispatch would reproduce this exact failure.

What

supabase/migrations/20260616_discovery_runs_skipped_query_count.sql — add column if not exists skipped_query_count integer not null default 0 on public.discovery_runs (idempotent; mirrors the existing query_count definition), with a deploy note that PGRST204 is a schema-cache miss (reload after db push).
A sound, all-model schema-parity test (test_every_persisted_model_field_maps_to_a_table_column). The first cut of this test was unsound — it substring-matched field names against all migrations concatenated, and discovery models share field names (status, errors, dataset_name, …) with columns on other tables, so a column missing from discovery_runs could be masked by a same-named column elsewhere. The replacement parses columns per table (CREATE TABLE block + ALTER … ADD COLUMN) and asserts model_fields ⊆ columns for all five persisted discovery models (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Drift in any of them now fails CI. The other four tables were audited and are in sync.
Plus a targeted regression test for the skipped_query_count column itself.

Verification

ruff format --check . / ruff check . / mypy src/ — clean
pytest -q tests/unit -k "discovery or migration or storage" — green
Soundness proven: with the column scoped out of discovery_runs, the new test reports it missing (the old substring test would have passed).

Deploy (must happen for the runtime error to stop)

Code alone can't add the column. After merge:

Apply the migration to the live project: supabase db push (project ltafdcsnvppvqyoclkod).
If the 400 persists, reload PostgREST's schema cache: notify pgrst, 'reload schema';
Then close [daily-ai-review] Supabase schema error: missing 'skipped_query_count' column #150, [daily-ai-review] Database schema error: missing 'skipped_query_count' column #152, [daily-ai-review] Database schema mismatch preventing candidate persistence #156, [daily-ai-review] Database schema error - missing 'skipped_query_count' column #184, [daily-ai-review] Database schema error preventing source candidate persistence #186 — they remain open until the fix is deployed, not merged.

Refs #150, #152, #156, #184, #186.

DiscoveryRun.skipped_query_count (the rolling-window search backstop's deliberately-skipped-query counter) is written by DiscoveryStore.write_run() via a full model_dump() POST to discovery_runs, but no migration ever added the column. Every run-metadata write therefore failed with "column skipped_query_count does not exist", blocking candidate persistence on the daily-review runs (issues #150/#152/#156/#184/#186). Add supabase/migrations/20260616_discovery_runs_skipped_query_count.sql, a regression test for the column, and a guard test asserting every DiscoveryRun field maps to a discovery_runs column so this class of drift is caught in CI. This also clears the schema break that would otherwise fail the UNIFY-PR-06 go-live dispatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Adds the missing skipped_query_count column to the Supabase public.discovery_runs table so DiscoveryStore.write_run() can persist the full DiscoveryRun.model_dump() payload without PostgREST schema errors, and adds unit tests to prevent future model↔schema drift.

Changes:

Add an idempotent Supabase migration to create discovery_runs.skipped_query_count with NOT NULL DEFAULT 0.
Add a regression test for the new column and a schema-guard test intended to ensure every DiscoveryRun field maps to a discovery_runs column.
Update .agent-plan.md to mark the schema fix as completed.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
tests/unit/test_discovery_migration.py	Adds regression/guard tests for discovery-run schema compatibility.
supabase/migrations/20260616_discovery_runs_skipped_query_count.sql	Adds the missing `skipped_query_count` column to `public.discovery_runs`.
.agent-plan.md	Documents completion of the discovery-runs schema fix work item.

+def test_discovery_runs_has_a_column_for_every_model_field() -> None:
+    """Every DiscoveryRun scalar field must map to a discovery_runs column across
+    the migrations, so write_run()'s model_dump() never posts an unknown column."""
+    migrations = "\n".join(
+        p.read_text(encoding="utf-8") for p in sorted(Path("supabase/migrations").glob("*.sql"))
+    )
+    # `errors` is the jsonb column; the rest are scalar columns on discovery_runs.
+    for field in DiscoveryRun.model_fields:
+        assert field in migrations, f"discovery_runs is missing a column for {field!r}"


codecov · 2026-06-16T07:26:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.84%. Comparing base (4897b92) to head (18bca77).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #223   +/-   ##
=======================================
  Coverage   92.84%   92.84%           
=======================================
  Files          84       84           
  Lines       12337    12337           
=======================================
  Hits        11454    11454           
  Misses        883      883

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Self-review found the regression test was unsound: it substring-matched each DiscoveryRun field against ALL migrations concatenated, but discovery models share field names (status, errors, dataset_name, ...) with columns on other tables — so a column missing from discovery_runs could be masked by a same-named column elsewhere, and the guard would pass while production threw PGRST204. - Replace it with a per-table column extractor (scoped to each table's CREATE TABLE block + ALTER ... ADD COLUMN) and assert model_fields is a subset of the table's columns for ALL five persisted discovery models (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance), with an explicit (empty) per-model exception set. Drift in any of them now fails CI. Verified the other four tables are already in sync. - Add a deploy note to the migration: PGRST204 is a schema-cache miss, so after `supabase db push` PostgREST must reload its schema cache. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

+  Supabase run-metadata write and blocking candidate persistence on daily-review runs; a guard test
+  now asserts every `DiscoveryRun` field maps to a column. Clears the schema break on the


+  every run-metadata write failed with “column skipped_query_count does not exist”, blocking
+  candidate persistence on the daily-review runs. New migration
+  `supabase/migrations/20260616_discovery_runs_skipped_query_count.sql` plus a regression test and a
+  guard test asserting every `DiscoveryRun` field maps to a `discovery_runs` column. Clears the


github-actions · 2026-06-16T09:38:54Z

pr-agent-context report:

This run includes unresolved review comments on PR #223.

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: tests/unit/test_discovery_migration.py
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3418832097
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The guard test can produce false positives because it asserts each `DiscoveryRun` field name appears anywhere across *all* migrations. Several `DiscoveryRun` fields (e.g. `candidate_count`) also appear in other tables’ migrations, so the test could pass even if `public.discovery_runs` is missing a column. Restrict the search to SQL statements that define/alter `public.discovery_runs` and then match whole-word column names there.

## COPILOT-2
Location: .agent-plan.md:12
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608479
Root author: copilot-pull-request-reviewer

Comment:
    This entry says the new guard test asserts every `DiscoveryRun` field maps to a column, but the added unit test actually checks schema parity for *all persisted discovery models* (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Update the wording so the plan remains accurate mainline truth.

## COPILOT-3
Location: .agent-plan.md:361
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608518
Root author: copilot-pull-request-reviewer

Comment:
    This line narrows the guard test to `DiscoveryRun`/`discovery_runs`, but the implemented test guards *all* persisted discovery models against model<->table drift. Adjust wording so the ledger entry matches what the test enforces.

Run metadata:

Tool ref: v4.0.19
Tool version: 4.0.19
Trigger: commit pushed
Workflow run: 27608215679 attempt 1
Comment timestamp: 2026-06-16T09:38:04.505548+00:00
PR head commit: 18bca77c1844fe522af90c037e8e50177fa7649e

Copilot AI review requested due to automatic review settings June 16, 2026 07:20

shaypal5 added this to the Local↔CI Unification milestone Jun 16, 2026

shaypal5 added bug Something isn't working discovery Discovery-layer and candidate-retention work labels Jun 16, 2026

docs(plan): record PR #223 for the discovery_runs schema fix

b5e1963

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot started reviewing on behalf of shaypal5 June 16, 2026 07:20 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings June 16, 2026 09:31

Copilot started reviewing on behalf of shaypal5 June 16, 2026 09:31 View session

docs(plan): refs (not closes) for the deploy-gated schema issues

18bca77

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This comment has been minimized.

Sign in to view

Copilot AI reviewed Jun 16, 2026

View reviewed changes

shaypal5 merged commit 33bf2ac into main Jun 16, 2026
11 checks passed

shaypal5 deleted the codex/discovery-runs-skipped-query-count branch June 16, 2026 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(discovery): add missing discovery_runs.skipped_query_count column#223

fix(discovery): add missing discovery_runs.skipped_query_count column#223
shaypal5 merged 4 commits into
mainfrom
codex/discovery-runs-skipped-query-count

shaypal5 commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

This comment has been minimized.

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Supabase run-metadata write and blocking candidate persistence on daily-review runs; a guard test
		now asserts every `DiscoveryRun` field maps to a column. Clears the schema break on the

Conversation

shaypal5 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Verification

Deploy (must happen for the runtime error to stop)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

This comment has been minimized.

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented Jun 16, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading