Skip to content

fix(discovery): add missing discovery_runs.skipped_query_count column#223

Merged
shaypal5 merged 4 commits into
mainfrom
codex/discovery-runs-skipped-query-count
Jun 16, 2026
Merged

fix(discovery): add missing discovery_runs.skipped_query_count column#223
shaypal5 merged 4 commits into
mainfrom
codex/discovery-runs-skipped-query-count

Conversation

@shaypal5

@shaypal5 shaypal5 commented Jun 16, 2026

Copy link
Copy Markdown
Member

Why

The daily-review bot has filed the same failure five times — #150, #152, #156, #184, #186 — all verbatim PGRST204: "Could not find the 'skipped_query_count' column of 'discovery_runs' in the schema cache." This is the root-cause fix.

DiscoveryRun gained a skipped_query_count field (the rolling-window search backstop from UNIFY-PR-05 counts queries it deliberately skips — discovery/models.py:269) and the pipeline writes it (pipeline.py:1417). The Supabase-backed DiscoveryStore.write_run() posts the full model_dump() to the discovery_runs table via PostgREST (discovery/storage.py:471), but no migration ever added the column — the discovery_runs table (20260410…foundation.sql) has query_count/candidate_count/… but not skipped_query_count. So every run-metadata upsert 400s and aborts source-candidate persistence.

It's also on the UNIFY-PR-06 go-live critical path: that PR re-enables the scheduled discover/daily-review workflows, which persist to Supabase — so the go-live dispatch would reproduce this exact failure.

What

  • supabase/migrations/20260616_discovery_runs_skipped_query_count.sqladd column if not exists skipped_query_count integer not null default 0 on public.discovery_runs (idempotent; mirrors the existing query_count definition), with a deploy note that PGRST204 is a schema-cache miss (reload after db push).
  • A sound, all-model schema-parity test (test_every_persisted_model_field_maps_to_a_table_column). The first cut of this test was unsound — it substring-matched field names against all migrations concatenated, and discovery models share field names (status, errors, dataset_name, …) with columns on other tables, so a column missing from discovery_runs could be masked by a same-named column elsewhere. The replacement parses columns per table (CREATE TABLE block + ALTER … ADD COLUMN) and asserts model_fields ⊆ columns for all five persisted discovery models (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Drift in any of them now fails CI. The other four tables were audited and are in sync.
  • Plus a targeted regression test for the skipped_query_count column itself.

Verification

  • ruff format --check . / ruff check . / mypy src/ — clean
  • pytest -q tests/unit -k "discovery or migration or storage" — green
  • Soundness proven: with the column scoped out of discovery_runs, the new test reports it missing (the old substring test would have passed).

Deploy (must happen for the runtime error to stop)

Code alone can't add the column. After merge:

  1. Apply the migration to the live project: supabase db push (project ltafdcsnvppvqyoclkod).
  2. If the 400 persists, reload PostgREST's schema cache: notify pgrst, 'reload schema';
  3. Then close [daily-ai-review] Supabase schema error: missing 'skipped_query_count' column #150, [daily-ai-review] Database schema error: missing 'skipped_query_count' column #152, [daily-ai-review] Database schema mismatch preventing candidate persistence #156, [daily-ai-review] Database schema error - missing 'skipped_query_count' column #184, [daily-ai-review] Database schema error preventing source candidate persistence #186 — they remain open until the fix is deployed, not merged.

Refs #150, #152, #156, #184, #186.

DiscoveryRun.skipped_query_count (the rolling-window search backstop's
deliberately-skipped-query counter) is written by DiscoveryStore.write_run()
via a full model_dump() POST to discovery_runs, but no migration ever added
the column. Every run-metadata write therefore failed with "column
skipped_query_count does not exist", blocking candidate persistence on the
daily-review runs (issues #150/#152/#156/#184/#186).

Add supabase/migrations/20260616_discovery_runs_skipped_query_count.sql, a
regression test for the column, and a guard test asserting every DiscoveryRun
field maps to a discovery_runs column so this class of drift is caught in CI.

This also clears the schema break that would otherwise fail the UNIFY-PR-06
go-live dispatch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 07:20
@shaypal5 shaypal5 added this to the Local↔CI Unification milestone Jun 16, 2026
@shaypal5 shaypal5 added bug Something isn't working discovery Discovery-layer and candidate-retention work labels Jun 16, 2026
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the missing skipped_query_count column to the Supabase public.discovery_runs table so DiscoveryStore.write_run() can persist the full DiscoveryRun.model_dump() payload without PostgREST schema errors, and adds unit tests to prevent future model↔schema drift.

Changes:

  • Add an idempotent Supabase migration to create discovery_runs.skipped_query_count with NOT NULL DEFAULT 0.
  • Add a regression test for the new column and a schema-guard test intended to ensure every DiscoveryRun field maps to a discovery_runs column.
  • Update .agent-plan.md to mark the schema fix as completed.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/unit/test_discovery_migration.py Adds regression/guard tests for discovery-run schema compatibility.
supabase/migrations/20260616_discovery_runs_skipped_query_count.sql Adds the missing skipped_query_count column to public.discovery_runs.
.agent-plan.md Documents completion of the discovery-runs schema fix work item.

Comment thread tests/unit/test_discovery_migration.py Outdated
Comment on lines +35 to +43
def test_discovery_runs_has_a_column_for_every_model_field() -> None:
"""Every DiscoveryRun scalar field must map to a discovery_runs column across
the migrations, so write_run()'s model_dump() never posts an unknown column."""
migrations = "\n".join(
p.read_text(encoding="utf-8") for p in sorted(Path("supabase/migrations").glob("*.sql"))
)
# `errors` is the jsonb column; the rest are scalar columns on discovery_runs.
for field in DiscoveryRun.model_fields:
assert field in migrations, f"discovery_runs is missing a column for {field!r}"
@github-actions

This comment has been minimized.

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.84%. Comparing base (4897b92) to head (18bca77).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #223   +/-   ##
=======================================
  Coverage   92.84%   92.84%           
=======================================
  Files          84       84           
  Lines       12337    12337           
=======================================
  Hits        11454    11454           
  Misses        883      883           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

This comment has been minimized.

Self-review found the regression test was unsound: it substring-matched each
DiscoveryRun field against ALL migrations concatenated, but discovery models
share field names (status, errors, dataset_name, ...) with columns on other
tables — so a column missing from discovery_runs could be masked by a same-named
column elsewhere, and the guard would pass while production threw PGRST204.

- Replace it with a per-table column extractor (scoped to each table's CREATE
  TABLE block + ALTER ... ADD COLUMN) and assert model_fields is a subset of the
  table's columns for ALL five persisted discovery models (DiscoveryRun,
  PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance), with
  an explicit (empty) per-model exception set. Drift in any of them now fails CI.
  Verified the other four tables are already in sync.
- Add a deploy note to the migration: PGRST204 is a schema-cache miss, so after
  `supabase db push` PostgREST must reload its schema cache.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 09:31
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread .agent-plan.md
Comment on lines +11 to +12
Supabase run-metadata write and blocking candidate persistence on daily-review runs; a guard test
now asserts every `DiscoveryRun` field maps to a column. Clears the schema break on the
Comment thread .agent-plan.md
every run-metadata write failed with “column skipped_query_count does not exist”, blocking
candidate persistence on the daily-review runs. New migration
`supabase/migrations/20260616_discovery_runs_skipped_query_count.sql` plus a regression test and a
guard test asserting every `DiscoveryRun` field maps to a `discovery_runs` column. Clears the
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #223.

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: tests/unit/test_discovery_migration.py
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3418832097
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The guard test can produce false positives because it asserts each `DiscoveryRun` field name appears anywhere across *all* migrations. Several `DiscoveryRun` fields (e.g. `candidate_count`) also appear in other tables’ migrations, so the test could pass even if `public.discovery_runs` is missing a column. Restrict the search to SQL statements that define/alter `public.discovery_runs` and then match whole-word column names there.

## COPILOT-2
Location: .agent-plan.md:12
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608479
Root author: copilot-pull-request-reviewer

Comment:
    This entry says the new guard test asserts every `DiscoveryRun` field maps to a column, but the added unit test actually checks schema parity for *all persisted discovery models* (DiscoveryRun, PersistentCandidate, BackfillBatch, ScrapeAttempt, CandidateProvenance). Update the wording so the plan remains accurate mainline truth.

## COPILOT-3
Location: .agent-plan.md:361
URL: https://github.com/DataHackIL/tfht_enforce_idx/pull/223#discussion_r3419608518
Root author: copilot-pull-request-reviewer

Comment:
    This line narrows the guard test to `DiscoveryRun`/`discovery_runs`, but the implemented test guards *all* persisted discovery models against model<->table drift. Adjust wording so the ledger entry matches what the test enforces.

Run metadata:

Tool ref: v4.0.19
Tool version: 4.0.19
Trigger: commit pushed
Workflow run: 27608215679 attempt 1
Comment timestamp: 2026-06-16T09:38:04.505548+00:00
PR head commit: 18bca77c1844fe522af90c037e8e50177fa7649e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working discovery Discovery-layer and candidate-retention work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[daily-ai-review] Supabase schema error: missing 'skipped_query_count' column

2 participants