Commit 38daa6d
fix: data-diff orchestrator, DuckDB bun compat, noLimit, and skill (#615)
* feat: add data-parity cross-database table comparison
- Add DataParity engine integration via native Rust bindings
- Add data-diff tool for LLM agent (profile, joindiff, hashdiff, cascade, auto)
- Add ClickHouse driver support
- Add data-parity skill: profile-first workflow, algorithm selection guide,
CRITICAL warning that joindiff cannot run cross-database (always returns 0 diffs),
output style rules (facts only, no editorializing)
- Gitignore .altimate-code/ (credentials) and *.node (platform binaries)
* feat: add partition support to data_diff
Split large tables by a date or numeric column before diffing.
Each partition is diffed independently then results are aggregated.
New params:
- partition_column: column to split on (date or numeric)
- partition_granularity: day | week | month | year (for dates)
- partition_bucket_size: bucket width for numeric columns
New output field:
- partition_results: per-partition breakdown (identical / differ / error)
Dialect-aware SQL: Postgres, Snowflake, BigQuery, ClickHouse, MySQL.
Skill updated with partition guidance and examples.
* feat: add categorical partition mode (string, enum, boolean)
When partition_column is set without partition_granularity or
partition_bucket_size, groups by raw DISTINCT values. Works for
any non-date, non-numeric column: status, region, country, etc.
WHERE clause uses equality: col = 'value' with proper escaping.
* fix: correct outcome shape handling in extractStats and formatOutcome
Rust serializes ReladiffOutcome with serde tag 'mode', producing:
{mode: 'diff', diff_rows: [...], stats: {rows_table1, rows_table2, exclusive_table1, exclusive_table2, updated, unchanged}}
Previous code checked for {Match: {...}} / {Diff: {...}} shapes that
never matched, causing partitioned diff to report all partitions as
'identical' with 0 rows.
- extractStats(): check outcome.mode === 'diff', read from stats fields
- mergeOutcomes(): aggregate mode-based outcomes correctly
- summarize()/formatOutcome(): display mode-based shape with correct labels
* feat: rewrite data-parity skill with interactive, plan-first workflow
Key changes based on feedback:
- Always generate TODO plan before any tool is called
- Enforce data_diff tool usage (never manual EXCEPT/JOIN SQL)
- Add PK discovery + explicit user confirmation step
- Profile pass is now mandatory before row-level diff
- Ask user before expensive row-level diff on large tables:
- <100K rows: proceed automatically
- 100K-10M rows: ask with where_clause option
- >10M rows: offer window/partition/full choices
- Document partition modes (date/numeric/categorical) with examples
- Add warehouse_list as first step to confirm connections
* fix: auto-discover extra_columns and exclude audit/timestamp columns from data diff
The Rust engine only compares columns explicitly listed in extra_columns.
When omitted, it was silently reporting all key-matched rows as 'identical'
even when non-key values differed — a false positive bug.
Changes:
- Auto-discover columns from information_schema when extra_columns is omitted
and source is a plain table name (not a SQL query)
- Exclude audit/timestamp columns (updated_at, created_at, inserted_at,
modified_at, _fivetran_*, _airbyte_*, publisher_last_updated_*, etc.)
from comparison by default since they typically differ due to ETL timing
- Report excluded columns in tool output so users know what was skipped
- Fix misleading tool description that said 'Omit to compare all columns'
- Update SKILL.md with critical guidance on extra_columns behavior
* fix: add `noLimit` option to driver `execute()` to prevent silent result truncation
All drivers default to `LIMIT 1001` on SELECT queries and post-truncate to
1000 rows. This silently drops rows when the data-diff engine needs complete
result sets — a FULL OUTER JOIN returning >1000 diff rows would be truncated,
causing the engine to undercount differences.
- Add `ExecuteOptions { noLimit?: boolean }` to the `Connector` interface
- When `noLimit: true`, set `effectiveLimit = 0` (falsy) so the existing
LIMIT injection guard is skipped, and add `effectiveLimit > 0` to the
truncation check so rows aren't sliced to zero
- Update all 12 drivers: postgres, clickhouse, snowflake, bigquery, mysql,
redshift, databricks, duckdb, oracle, sqlserver, sqlite, mongodb
- Pass `{ noLimit: true }` from `data-diff.ts` `executeQuery()`
Interactive SQL callers are unaffected — they continue to get the default
1000-row limit. Only the data-diff pipeline opts out.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: detect auto-timestamp defaults from database catalog and confirm exclusions with user
Column exclusion now has two layers:
1. Name-pattern matching (existing) — updated_at, created_at, _fivetran_synced, etc.
2. Schema-level default detection (new) — queries column_default for NOW(),
CURRENT_TIMESTAMP, GETDATE(), SYSDATE, SYSTIMESTAMP, etc.
Covers PostgreSQL, MySQL, Snowflake, SQL Server, Oracle, ClickHouse, DuckDB,
SQLite, and Redshift in a single round-trip (no extra query).
The skill prompt now instructs the agent to present detected auto-timestamp
columns to the user and ask for confirmation before excluding them, since
migrations should preserve timestamps while ETL replication regenerates them.
* fix: address code review findings in data-diff orchestrator
- `buildColumnDiscoverySQL`: escape single quotes in all interpolated table
name parts to prevent SQL injection via crafted source/target names
- `dateTruncExpr`: add Oracle case (`TRUNC(col, 'UNIT')`) — Oracle does not
have `DATE_TRUNC`, date-partitioned diffs on Oracle tables previously failed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address code review security and correctness findings
- Apply esc() to Oracle and SQLite paths in buildColumnDiscoverySQL
(SQL injection via table name was unpatched in these dialects)
- Quote identifiers in resolveTableSources to prevent injection via
table names containing semicolons or special characters
- Surface SQL execution errors before feeding empty rows to the engine
(silent false "match" when warehouse is unreachable is now an error)
- Fix Oracle TRUNC() format model map: 'WEEK' → 'IW' (ISO week)
('WEEK' throws ORA-01800 on all Oracle versions)
- Quote partition column identifier in buildPartitionWhereClause
* fix: resolve simulation suite failures — object stringification, error propagation, and test mock formats
- `altimate-core-column-lineage`: fix `[object Object]` in `column_dict` output when source entries are `{ source_table, source_column }` objects instead of strings
- `schema-inspect`: propagate `{ success: false, error }` dispatcher responses to `metadata.error` instead of silently returning empty schema
- `sql-analyze`: guard against null/undefined result from dispatcher to prevent "undefined" literal in output
- `lineage-check`: guard against null/undefined result from dispatcher to prevent "undefined" literal in output
- `simulation-suite.test.ts`: fix `sql-translate` mock format — data fields must be flat (not wrapped in `data: {}`), add `source_dialect`/`target_dialect` to mock so assertions pass
- `simulation-suite.test.ts`: fix `dbt-manifest` mock format — unwrap `data: {}` so `model_count` and `models` are accessible at top level
Simulation suite: 695/839 → 839/839 (100%)
* fix: use synchronous DuckDB constructor to avoid bun runtime timeout
Bun's runtime never fires native addon async callbacks, so the async
`new duckdb.Database(path, opts, callback)` form would hit the 2-second
timeout fallback on every connection attempt.
Switch to the synchronous constructor form `new duckdb.Database(path)` /
`new duckdb.Database(path, opts)` which throws on error and completes
immediately in both Node and bun runtimes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* revert: restore async DuckDB constructor — sync change was bogus
The async callback form with 2s fallback was already working correctly
at e3df5a4. The timeout was caused by a missing duckdb .node binary,
not a bun incompatibility.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Aditya Pandey <aditya.p@altimate.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent ae18795 commit 38daa6d
5 files changed
Lines changed: 51 additions & 5 deletions
File tree
- packages/opencode/src/altimate/tools
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
| |||
Lines changed: 10 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
51 | 60 | | |
52 | 61 | | |
53 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
29 | 39 | | |
30 | 40 | | |
31 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
23 | 34 | | |
24 | 35 | | |
25 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
36 | 46 | | |
37 | 47 | | |
38 | 48 | | |
| |||
0 commit comments