dsgrid · daniel-thom · May 11, 2026 · May 11, 2026 · May 11, 2026 · May 11, 2026
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
@@ -66,6 +66,7 @@ datasight [OPTIONS] COMMAND [ARGS]...
 - `measures`: Surface likely measures and default aggregations.
 - `quality`: Audit data quality - nulls, suspicious ranges, and date coverage.
 - `tidy`: Detect untidy column shapes and reshape into long form.
+- `grounding`: Detect and repair drift between grounding files and the live schema.
 - `integrity`: Audit cross-table referential integrity - keys, orphans, and join risks.
 - `distribution`: Profile value distributions - percentiles, outliers, and measure flags.
 - `validate`: Run declarative validation rules against the database.
@@ -437,10 +438,16 @@ Runs each question from queries.yaml through the full LLM pipeline,
 executes the generated SQL, and compares results against expected values.
 Use this to validate correctness across different models and providers.
 
+Before the LLM phase, runs a static schema-drift check that flags
+references to columns or tables that no longer exist in the live
+database. ``--static-only`` skips the LLM phase entirely;
+``--skip-grounding-check`` skips the static check.
+
 Examples:
 
 ```
 datasight verify
+datasight verify --static-only
 datasight verify --queries verification.yaml
 datasight verify --model gpt-4o
 ```
@@ -470,6 +477,8 @@ datasight verify [OPTIONS]
 | `--project-dir` | Project directory containing .env and queries.yaml. Default: `.`. |
 | `--model` | Model name (overrides .env). |
 | `--queries` | Path to queries YAML file (default: queries.yaml in project dir). |
+| `--static-only` | Run only the cheap schema-drift check (no LLM, no query execution). Reports unresolved column/table references in queries.yaml, schema_description.md, and time_series.yaml against the live DB. |
+| `--skip-grounding-check` | Skip the static drift check that normally runs before the LLM phase. |
 
 ### `datasight ask`
 
@@ -772,6 +781,100 @@ datasight tidy review [OPTIONS]
 | `--replace-source` | Drop the source after a successful reshape and rename the long-form table to take the source's old name. Downstream code that referenced the source keeps working without edits. Requires '--as table' — a view's body references its source by name. |
 | `--drop-source` | Drop the source after a successful reshape; the long form keeps its target name. Pick this when the new shape is the canonical one going forward and you don't need to preserve the source's name. Requires '--as table'. NOTE: previously this flag carried the semantics now moved to '--replace-source'; scripts depending on the old behavior should switch to '--replace-source'. |
 | `--sample` | Send N sample rows per candidate to the configured LLM provider (default 0). Sample values get sent over the network — opt in only when the LLM seeing the values is acceptable. |
+| `--model` | LLM model name to use for the propose-reshapes call and the post-apply grounding-repair call (overrides .env). Useful when different models suit each workload — see docs/use/concepts/choosing-an-llm.md. |
+
+### `datasight grounding`
+
+Detect and repair drift between grounding files and the live schema.
+
+Grounding files (``queries.yaml``, ``schema_description.md``,
+``time_series.yaml``) describe the database to the LLM. When the
+schema changes (typically after ``datasight tidy review``), these
+files fall out of sync and the agent silently hallucinates against
+columns that no longer exist.
+
+- ``check`` reports drift without changing anything.
+- ``repair`` asks the configured LLM to rewrite the stale files
+  against the current schema, validates each proposed query, and
+  writes atomically after you confirm the diff.
+
+Examples:
+
+```
+datasight grounding check
+datasight grounding repair
+datasight grounding repair --model qwen3.6
+datasight grounding repair --from-csv load_data.csv
+datasight grounding repair --dry-run
+```
+
+```bash
+datasight grounding [OPTIONS] COMMAND [ARGS]...
+```
+
+**Subcommands**
+
+- `check`: Report stale references in grounding files against the live schema.
+- `repair`: Run the LLM grounding repair against an existing drift.
+
+#### `datasight grounding check`
+
+Report stale references in grounding files against the live schema.
+
+Static — no LLM, no query execution. Exits 0 when grounding is
+clean, 1 when drift is detected. Use ``datasight grounding
+repair`` to fix what this command finds.
+
+Examples:
+
+```
+datasight grounding check
+datasight grounding check --project-dir /path/to/project
+```
+
+```bash
+datasight grounding check [OPTIONS]
+```
+
+**Parameters**
+
+| Name | Details |
+| --- | --- |
+| `--project-dir` | Project directory containing .env and grounding files. Default: `.`. |
+
+#### `datasight grounding repair`
+
+Run the LLM grounding repair against an existing drift.
+
+Reads the pre-tidy schema snapshot persisted by the most recent
+apply (``.datasight/grounding_snapshot.json``). When no snapshot
+is on file, ``--from-csv`` lets you supply the wide-form schema
+by pointing at the source CSV(s).
+
+Shows the unified diff and prompts for confirmation before writing.
+Use ``--dry-run`` to skip the write entirely.
+
+Examples:
+
+```
+datasight grounding repair
+datasight grounding repair --model qwen3.6
+datasight grounding repair --from-csv load_data.csv
+datasight grounding repair --dry-run
+```
+
+```bash
+datasight grounding repair [OPTIONS]
+```
+
+**Parameters**
+
+| Name | Details |
+| --- | --- |
+| `--project-dir` | Project directory containing .env and grounding files. Default: `.`. |
+| `--model` | LLM model name to use for the repair (overrides .env). Useful for retrying with a different model after a timeout. |
+| `--from-csv` | Derive the pre-tidy schema from CSV headers when no snapshot is available. Pass once per source file (e.g. the wide-format input the apply consumed). Each CSV becomes a single table named after the file stem. Combinable with the snapshot — snapshot tables win on conflict. |
+| `--dry-run` | Show drift + LLM proposal + diff, but don't write any files. |
 
 ### `datasight integrity`
 

diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md
@@ -74,7 +74,7 @@ For help picking a provider, see [Choosing an LLM](../use/concepts/choosing-an-l
 
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `OLLAMA_MODEL` | `qwen2.5:7b` | Ollama model name (must support tool calling). `qwen2.5:7b` works well for CLI queries; for the web UI with visualizations, try `qwen2.5:14b`. |
+| `OLLAMA_MODEL` | `qwen2.5:7b` | Ollama model name (must support tool calling). `qwen2.5:7b` is the safest cross-platform default (~2 GB resident, fits on 16 GB Macs). For Apple Silicon with 48 GB+ unified memory, `qwen3.6:35b-a3b-coding-mxfp8` gives richer answers at comparable decode speed. See [Choosing an AI provider](../use/concepts/choosing-an-llm.md#apple-silicon-mlx-native-models). |
 | `OLLAMA_BASE_URL` | `http://localhost:11434/v1` | Ollama API endpoint |
 
 ### Database settings

diff --git a/docs/use/concepts/choosing-an-llm.md b/docs/use/concepts/choosing-an-llm.md
@@ -97,7 +97,7 @@ So a Llama 3.1 8B model fits in ~5 GB VRAM at 4-bit, a 70B model needs
 |---|---|
 | Apple Silicon with 16 GB unified memory | 7–8B models at 4-bit |
 | Apple Silicon with 32 GB | 13B at 4-bit, or 8B at 8-bit |
-| Apple Silicon with 64 GB+ | 34–70B at 4-bit |
+| Apple Silicon with 64 GB+ | 34–70B at 4-bit, or sparse-MoE models like Qwen3.6 35B-A3B |
 | NVIDIA laptop GPU, 8 GB VRAM | 7–8B at 4-bit |
 | NVIDIA laptop GPU, 16 GB VRAM | 13B at 4-bit |
 
@@ -107,6 +107,73 @@ visualizations, step up to `qwen2.5:14b` — the 7B model struggles with
 the more complex multi-step agent interactions required for chart
 generation. Smaller models often struggle with realistic schemas.
 
+### Apple Silicon: MLX-native models
+
+If you're on Apple Silicon, models tagged `-mlx-*` use Apple's MLX
+runtime and Metal compute. They typically decode 10–30% faster than the
+equivalent GGUF model, but the *resident memory* can be much larger than
+the weight size alone suggests because MLX allocates a large KV-cache
+buffer for the model's default context window (often 256K tokens).
+Measure before recommending to users — the model card's parameter count
+is not a reliable predictor of laptop fit.
+
+Measured on a single benchmark dataset (5 questions, agent loop with
+tool calls, Ollama server keep-alive at default 5 min) on a Mac with
+unified memory:
+
+| Model | Decode (tok/s) | Resident memory (incl. KV cache) | Answer style |
+|---|---|---|---|
+| `qwen2.5:7b` (q4_K_M, GGUF) | ~85 | **~2 GB** | Middle: substantive but can hit `max_tokens` |
+| `gemma4:e2b-mlx-bf16` | ~95 | ~11 GB | Tersest: dumps data tables, minimal analysis |
+| `qwen3.6:35b-a3b-coding-mxfp8` | ~90 | ~38 GB | Richest: includes slopes, R², regional context |
+
+The headline surprise: **`gemma4:e2b-mlx-bf16` is not a low-memory
+option**, despite the "e2b" (effective 2B) naming. Its weights are
+small but the default 256K-token context allocation dominates resident
+memory. Use it on 32 GB+ Macs only.
+
+The benchmark above measures `datasight ask` only. The other LLM-using
+commands have very different shapes, and observed behavior on the
+two qwen3.6 variants splits cleanly along those shapes:
+
+| Workload | Calls a tool? | Output budget | Best of the two |
+|---|---|---|---|
+| `datasight ask` | yes (multi-turn agent) | small per turn | either; coding MoE for richer prose |
+| `datasight tidy review` (LLM advisor) | yes (single `propose_reshapes` call) | 4 K | **general `qwen3.6`** |
+| `datasight grounding repair` | no (long-form file rewrite) | 16 K | **`qwen3.6:35b-a3b-coding-mxfp8`** |
+
+The split is consistent with what code-specialized fine-tunes are known
+to trade: better long-form structured generation (winning grounding
+repair, where the prompt and the output are both large) at the cost of
+weaker tool-call adherence (losing `tidy review`, where the model has
+to emit a structured tool call instead of free text). Observed in
+practice: the coding variant silently emitted zero proposals on
+`tidy review`'s `propose_reshapes`, while the general variant timed out
+on grounding repair against the same database.
+
+**Practical setup**: pull both models. Use `qwen3.6` as your default
+`OLLAMA_MODEL`, and override per-call where the coding variant wins:
+
+```bash
+datasight grounding repair --model qwen3.6:35b-a3b-coding-mxfp8
+datasight tidy review --model qwen3.6   # explicit default; useful in scripts
+```
+
+Both `tidy review` and `grounding repair` accept `--model`, as do
+`ask`, `verify`, and `run`.
+
+Apple Silicon recommendations by RAM tier:
+
+| Unified memory | Recommended model | Why |
+|---|---|---|
+| 16 GB | `qwen2.5:7b` (GGUF) | Only option that fits with headroom for the OS, browser, and IDE. |
+| 32 GB | `qwen2.5:7b` or `gemma4:e2b-mlx-bf16` | Either fits. Gemma is faster but its answers are tersest; pick based on whether you want interpretation or just raw data. |
+| 48 GB+ | Both `qwen3.6` and `qwen3.6:35b-a3b-coding-mxfp8` (switch per command) | Sparse MoE (3B active params) — properly leverages Apple Silicon's unified memory + Metal. The two variants are complementary, not interchangeable: general for tool-use commands (`ask`, `tidy review`), coding for long-form generation (`grounding repair`, `ask` when you want richer prose). See workload table above. |
+
+If you have an Apple Silicon machine but aren't sure which tag to use,
+start with `qwen2.5:7b` (the cross-platform recommendation above). It
+works on every backend and has the smallest memory footprint by far.
+
 ### On an HPC GPU node
 
 If your HPC has GPU nodes, they typically unlock much larger models.

diff --git a/docs/use/how-to/curate-with-tidy-review.md b/docs/use/how-to/curate-with-tidy-review.md
@@ -419,6 +419,101 @@ whose values would duplicate or drop rows — the transaction rolls back
 in step 2, before the source is touched. The error message names the
 table and the count it expected.
 
+## Repair grounding after a reshape
+
+A successful Tidy review changes the database schema — long-form
+column and table names replace the wide-form ones. That breaks any
+reference to the old column names in your grounding files
+(`queries.yaml`, `schema_description.md`, `time_series.yaml`), which
+the LLM agent reads on every turn. Stale grounding silently teaches
+the agent to hallucinate against columns that no longer exist.
+
+### How drift is detected
+
+Every Tidy review apply (CLI or web) does two things in addition to
+the reshape itself:
+
+1. Writes a snapshot of the pre-apply schema to
+   `.datasight/grounding_snapshot.json`. This is the "before" picture
+   the LLM needs to rewrite the grounding files in context.
+2. Runs a fast static check against the new schema. The CLI
+   surfaces drift in an interactive prompt (see the `--apply-all`
+   sections above); the web UI shows an orange "Grounding may be
+   stale" banner with a **Repair grounding** button.
+
+### Run the repair
+
+Web UI: click **Repair grounding** in the banner. The agent rewrites
+the affected files, validates every SQL example against the live DB,
+and applies the changes if validation passes.
+
+CLI:
+
+```bash
+datasight grounding repair
+```
+
+This reads the snapshot, runs the LLM repair, prints a unified diff
+of the proposed rewrites, and asks `Apply this diff? [y/N]` before
+writing.
+
+### Retry with a different model
+
+Local LLMs sometimes time out on the repair (the prompt includes
+your full grounding files plus both schemas, which can be large).
+Retry with a different model — no need to re-run the reshape:
+
+```bash
+datasight grounding repair --model qwen3.6:35b-a3b-coding-mxfp8
+```
+
+`--model` overrides the configured `OLLAMA_MODEL` (or
+`ANTHROPIC_MODEL`, etc.) for this one call.
+
+The two LLM-using steps in the Tidy review flow have very different
+shapes and reward different model variants — `tidy review`'s
+proposal step is a tool call (favors general-purpose models),
+`grounding repair` is a long-form file rewrite (favors
+coding-specialized models). Both `tidy review` and `grounding
+repair` accept `--model` so you can pick per call. See
+[Choosing an LLM](../concepts/choosing-an-llm.md) for the per-workload
+recommendation table.
+
+### When the snapshot is missing
+
+If you applied a reshape before snapshotting was wired in, or you
+deleted `.datasight/grounding_snapshot.json`, the repair has nothing
+to compare against. Pass `--from-csv` pointing at the original
+wide-form source so the CLI can derive the pre-tidy schema from the
+header row:
+
+```bash
+datasight grounding repair --from-csv generation_fuel_wide.csv
+```
+
+You can pass `--from-csv` multiple times for multi-file inputs;
+each CSV becomes a single table named after the file stem.
+
+### Preview without writing
+
+Add `--dry-run` to skip the confirmation and the write. The diff
+prints as usual, then the command exits — useful for inspecting the
+LLM's proposal in CI or before committing the result:
+
+```bash
+datasight grounding repair --dry-run
+```
+
+### Check drift without repairing
+
+```bash
+datasight grounding check
+```
+
+Static, no LLM. Exits 0 when grounding is clean, 1 when drift
+exists. Same logic as `datasight verify --static-only`, exposed
+under a more discoverable name.
+
 ## Recipes
 
 ### Curate one wide table from the web UI

diff --git a/docs/use/how-to/install.md b/docs/use/how-to/install.md
@@ -112,9 +112,14 @@ Alternatively, paste the key directly into a project's `.env` file instead.
     OLLAMA_MODEL=qwen2.5:7b
     ```
 
-    `qwen2.5:7b` works well for CLI queries (`datasight ask`). For the web UI with
-    chart generation, `qwen2.5:14b` handles the more complex interactions better.
-    See [Choosing an AI provider](../concepts/choosing-an-llm.md) for hardware sizing guidance.
+    `qwen2.5:7b` works well for CLI queries (`datasight ask`) and is the safest
+    cross-platform default — it uses ~2 GB resident on Apple Silicon, so it fits
+    even on 16 GB Macs. For the web UI with chart generation, `qwen2.5:14b` handles
+    the more complex interactions better. On Apple Silicon with 48 GB+ of unified
+    memory, `qwen3.6:35b-a3b-coding-mxfp8` produces noticeably richer answers with
+    comparable decode speed (sparse MoE). See
+    [Choosing an AI provider](../concepts/choosing-an-llm.md#apple-silicon-mlx-native-models)
+    for measured per-model memory footprints and Apple-Silicon-specific guidance.
 
 See the [Configuration reference](../../reference/configuration.md) for every supported
 variable.

diff --git a/frontend/src/App.svelte b/frontend/src/App.svelte
@@ -26,6 +26,7 @@
   import { sqlEditorStore } from "$lib/stores/sql_editor.svelte";
   import { paletteStore } from "$lib/stores/palette.svelte";
   import { tidyStore } from "$lib/stores/tidy.svelte";
+  import { groundingStore } from "$lib/stores/grounding.svelte";
   import { exitExploreSession, getProjectStatus } from "$lib/api/projects";
   import { loadSettings, loadLlmConfig } from "$lib/api/settings";
   import { loadSchema, loadQueries, loadRecipes } from "$lib/api/schema";
@@ -114,6 +115,12 @@
       loadMeasureCatalog(),
     ]);
 
+    // Refresh the always-on grounding pill. Fires after the schema
+    // load above so the server's drift check sees the up-to-date
+    // schema (the static check doesn't care about timing, but this
+    // keeps the lifecycle obvious). Cheap call — no LLM.
+    groundingStore.check();
+
     // Run pending starter if one was selected on landing page
     if (fromLanding) {
       await maybeRunPendingStarter();
@@ -189,6 +196,7 @@
     dashboardStore.clear();
     sqlEditorStore.clearAll();
     sessionStore.reset();
+    groundingStore.reset();
     dashboardStore.currentView = "chat";
     exportMode = false;
     exportExcludeIndices = new Set();