chore(eql-types): inherit workspace lints#289
Closed
coderdan wants to merge 1268 commits into
Closed
Conversation
Rewrites `@>(eql_v2_encrypted, eql_v2.stevec_query)` and
`@>(eql_v2_encrypted, eql_v2.ste_vec_entry)` to reduce to a native
`jsonb @>` over `eql_v2.to_stevec_query(a)::jsonb`. The planner now
matches the inlined predicate structurally against a functional GIN
index on the same expression, so the typed `@>` is the canonical
recipe end-to-end — no need to expose `to_stevec_query(col)` in the
WHERE clause to engage the index.
`to_stevec_query` tightens its element normalization to keep only
`{s, hm, oc}` (drops `c`, `a`, `i`, `v`, anything else cipherstash-suite
might emit). This is the matching-relevant set per the XOR contract;
both haystack and needle sides normalize identically so jsonb @>
compares apples-to-apples.
XOR-correctness gain: the previous `hmac_256_terms` recipe silently
dropped oc-bearing sv elements (string / number leaves carry `oc`,
not `hm`), so containment queries via that index could never match
on string / number selectors. The new recipe is XOR-aware — one
functional GIN covers every selector regardless of which term it
carries.
Adds explicit XOR-aware regression tests in
`containment_with_index_tests`:
- `typed_contains_matches_hm_bearing_selector` — sanity check
- `typed_contains_matches_oc_bearing_selector` — the load-bearing
test for the XOR-correctness gap. Synthesises an sv with a single
oc-bearing element and verifies `@>(stevec_query)` matches it.
Would have silently returned zero rows under the
hmac_256_terms-based recipe.
- `typed_contains_mixed_sv_engages_both_selector_kinds` — both
hm and oc lookups against the same row
- `typed_contains_wrong_term_does_not_match` — negative case
- `functional_gin_on_to_stevec_query_engages_for_typed_contains` —
the load-bearing plan assertion: GIN on
`(eql_v2.to_stevec_query(col)::jsonb) jsonb_path_ops` is matched
structurally by the inlined typed `@>(stevec_query)` body, so
bare-form containment engages Bitmap Index Scan.
The needle is expected to be `{s, hm-or-oc}`-shaped per the
stevec_query contract; if callers pass through extracted entries
they should normalize first (the `contains_with_stevec_query_overload`
test demonstrates the pattern).
`eql_v2.hmac_256_terms(eql_v2_encrypted)` was added under PR #205 as the recommended GIN-indexable aggregate for field-level containment. It's structurally wrong under the XOR contract: it filters out every sv element lacking `hm`, which means every oc-bearing element (string / number leaves) is invisible to the index. Containment queries via the recipe could never match on string / number selectors. The gap was masked by fixture data that violated the XOR contract (several test selectors carried `hm` on string fields where cipherstash-suite would emit `oc`). No test exercised hmac_256_terms containment against a properly-emitted oc-bearing selector, so the breakage was invisible. Replacement: the typed `@>(eql_v2_encrypted, eql_v2.stevec_query)` overload (added earlier in this PR, body inlined to a native jsonb `@>` over `eql_v2.to_stevec_query(a)::jsonb` in the prior commit). The canonical recipe is now: CREATE INDEX <name> ON <table> USING gin ((eql_v2.to_stevec_query(<col>)::jsonb) jsonb_path_ops); SELECT * FROM <table> WHERE <col> @> '{"sv":[{"s":"<sel>","hm-or-oc":"<term>"}]}'::eql_v2.stevec_query; Coverage: `containment_with_index_tests::typed_contains_matches_oc_bearing_selector` (added in the prior commit) is the regression-prevention test for the gap — it synthesises a one-element sv with `oc` only and asserts the typed @> matches it. Would silently return zero rows under the hmac_256_terms recipe. Drops: - `src/jsonb/functions.sql` — function definition (lines 437-476) replaced by a comment explaining the removal and pointing at the replacement recipe. - `tasks/pin_search_path.sql` — allowlist entry. - `tasks/test/splinter.sh` — splinter allowlist row. - `tests/sqlx/tests/hmac_256_terms_tests.rs` — dedicated test file (5 tests). Coverage of the actual containment recipe lives in `containment_with_index_tests::typed_contains_*` instead. - `CHANGELOG.md` — `Added` entry for the function (pre-release, so net effect is "never existed in 2.3"). Doc updates: - `docs/upgrading/v2.3.md` U-004 — recipe (b) rewritten to use `to_stevec_query`+ jsonb_path_ops GIN, query in typed `@>` form. - `docs/reference/database-indexes.md` — same swap.
…nt test Address Copilot review on #223: 1. The eql_v2.stevec_query DOMAIN CHECK only forbade `c` on sv elements. It now also requires each element to carry a selector `s` and exactly one deterministic term (`hm` XOR `oc`), matching the ste_vec_entry emission contract and the SteVecQueryElement JSON schema. Without this, a selector-only needle (`{"sv":[{"s":"x"}]}`) cast cleanly and then matched every row through the bare `jsonb @>` body. 2. contains_operator_term_does_not_contain_full_value asserted `e @> needle AND NOT (e @> e)` — `NOT (e @> e)` is always false, so the query returned 0 rows regardless of containment correctness. Rewrite it to assert genuine directional containment between two eql_v2_encrypted values: a one-entry subset of `e` is contained by `e`, but does not contain `e` back. Add stevec_query DOMAIN reject tests for selector-only and both-terms elements.
feat(stevec): typed query path — arrow returns ste_vec_entry, XOR-aware equality, stevec_query containment
eql_v2_encrypted is a composite type and eql_v2.encrypted_operator_class is its DEFAULT btree opclass, so PostgreSQL invokes the opclass FUNCTION 1 comparator during ANALYZE (and for sort-based GROUP BY / DISTINCT). FUNCTION 1 was eql_v2.compare, which #211 made strict — it raises without a Block-ORE `ob` term (U-005). So ANALYZE raised on every hm-only encrypted column, and autovacuum runs ANALYZE routinely; this broke silently across managed installs. Give the opclass its own FUNCTION 1, eql_v2.encrypted_btree_compare: a total, non-raising 3-way comparator — Block ORE when present, else a total order on the hm term, else a deterministic payload tie-break. eql_v2.compare stays strict for the < / > range-operator path. This also fixes GROUP BY / DISTINCT on a bare encrypted column: with a raising opclass comparator the planner fell through to PostgreSQL's built-in record comparison (the composite type's implicit record_ops), which compares raw ciphertext — so two encryptions of the same plaintext (same hm, different c) failed to deduplicate. - src/operators/operator_class.sql: add encrypted_btree_compare, repoint FUNCTION 1; the strict eql_v2.compare is untouched. - operator_class_tests.rs: un-ignore index_behavior_with_different_data_types (broken by the strict comparator); add analyze_on_hmac_only_column_does_not_raise. - CHANGELOG: Fixed entry.
…ted-columns fix: non-raising btree comparator for eql_v2_encrypted (ANALYZE fix)
Promote the [Unreleased] section to [2.3.0] — 2026-05-20, add a fresh empty [Unreleased], and update the link references.
chore(changelog): cut 2.3.0
The Upgrade notes paragraph said "Six numbered notes ... (U-006)", but v2.3 has eight — U-007 (typed `->` selector lookup) and U-008 (typed `stevec_query` containment), both added by #223, were already referenced by the section's own Added entries. Also drop the mention of the removed fused `eql_v2.hmac_256(col, '<selector>')` recipe from the U-004 summary.
docs(changelog): correct the 2.3.0 upgrade-notes summary
Surfaces query-only medians across four row-count tiers (10k–10M) in the README, with a pointer to the cipherstash/benches repository for full methodology, per-scenario SQL, and EXPLAIN plans.
docs: add Performance section to README
Fixes #232. Under the v2.3 payload schema an `eql_v2_encrypted` column admits two mutually exclusive top-level shapes: `EncryptedPayload` (carries `c` at the root) and `SteVecPayload` (carries `sv` at the root with the root document ciphertext at `sv[0].c`). `_encrypted_check_c` previously required `val ? 'c'`, which rejected every insert of a populated SteVec payload — blocking the entire JSON-encryption path for callers on `@cipherstash/protect-ffi >= 0.22.0` (where `mode: standard` is the new default). The check now passes when either `c` or `sv` is present at the root; per-element ciphertext validity on `sv` entries continues to be enforced by the `eql_v2.ste_vec_entry` DOMAIN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review feedback from @coderdan on #233. The pre-existing `ste_vec` and `ste_vec_vast` fixture migrations already emit real v2.3 SteVec-shape payloads (root `sv`, no root `c`), but they created their tables without applying `eql_v2.add_encrypted_constraint` — so the migrations passed even under the pre-fix `_encrypted_check_c` that rejected every populated SteVec payload. Apply the production CHECK constraint at fixture-load time so the existing fixture data is the regression guard: with the pre-fix function, all 10 inserts in `003_install_ste_vec_data.sql` (and all 500 in `005_install_ste_vec_vast_data.sql`) raise `Encrypted column missing ciphertext (c) field`; with this PR's fix they load cleanly. Verified empirically — temporarily reverted the `_encrypted_check_c` body, re-ran 003 against the resulting bundle, and confirmed it raises on the first insert. Also addresses two CodeRabbit nits: - `CHANGELOG.md`: change the parenthetical link from #232 (the issue) to #233 (this PR), per the repo's changelog convention. - `tests/sqlx/tests/constraint_tests.rs`: switch the `check_encrypted_accepts_stevec_payload` insert from `format!`-built SQL to a parameterized `$1` bind, matching the rest of the file's pattern around dynamic payload values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses follow-up review feedback from @coderdan on #233 ("I recommend adding the check for all fixture tables to prevent any regressions"). Extends the regression guard added in cf3d043 to the other fixture tables whose seed data is the production EQL payload shape: - `bench` (`tests/sqlx/migrations/007_install_bench_data.sql`) — loaded via `create_encrypted_json()` from `bench_data.sql`, every row carries the real envelope (`c`, `i`, `v=2`); constraint applied to all three encrypted columns. Verified by loading the 10K-row seed against the constraint locally. - `encrypted` in `tests/sqlx/fixtures/like_data.sql` — also seeded from `create_encrypted_json()`. The remaining fixture tables use intentionally-minimal synthetic payloads that lack required EQL envelope fields and therefore can't carry the production CHECK without rewriting the fixture data: - `ore` / `ore_text` — `{"ob": [...]}` only (no `v`, `i`, `c`), testing ORE block operator logic in isolation. - `agg_test` — v=1 payloads from the legacy aggregate-test SQL file. - `encrypted` in `encrypted_json.sql` / `array_data.sql` / `order_by_null_data.sql` — used by tests that add/remove the constraint dynamically or copy minimal-shape rows from `ore`. - `constrained` (`constraint_tables.sql`) — exists specifically to exercise Postgres-side UNIQUE/NOT-NULL/CHECK behaviour, including NULL inserts that would also collide with a production EQL CHECK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ncrypted fix(constraints): accept SteVec payloads in _encrypted_check_c
The eql-2.3.1 tag and GitHub release were cut, but CHANGELOG.md was not promoted — the #233 SteVec CHECK fix was still under [Unreleased]. Rename [Unreleased] -> [2.3.1] — 2026-05-21, add a fresh empty [Unreleased], and update the link references (new [Unreleased] compare URL from eql-2.3.1, new [2.3.1] tag URL).
Add a verify-changelog job to the Release EQL workflow that fails the release run when CHANGELOG.md (at the tagged commit) has no `## [<version>]` section — i.e. the release was cut without first promoting [Unreleased]. This is what happened with eql-2.3.1. Skips pre-releases, which keep entries under [Unreleased] until the final cut. It's an independent job: artifact build/publish still proceed (the release is already published by then), but the workflow goes red so the omission is visible. Also fix the README Release EQL badge: it tracked `?branch=main`, but the workflow effectively never runs on main (it runs on `release` tag events). The last main run was a cancelled workflow_dispatch from 2025-11-26, so the badge showed "failing" despite every release succeeding. Switch to `?event=release`.
chore(changelog): promote [Unreleased] to [2.3.1]
Adds the encrypted-int4 fixture generator and benchmark dataset generation tooling: tasks/fixtures/ and tasks/bench.toml, the tests/benchmarks/ harness, the CipherStash Proxy docker-compose used to encrypt fixture data, the int4 fixture install migration, and bench-data test coverage. Registers the new task files in mise.toml and ignores local mise credential overrides.
Generator no longer shells out to a CipherStash Proxy container to encrypt fixture values. `cipherstash-client` 0.35 is now a direct dependency of the SQLx test crate and the new `fixtures::cipherstash` module owns ZeroKMS bootstrap (cached in a OnceCell), the index-name -> ColumnConfig mapping, and the per-value encrypt helper. The Rust fixture owns configuration and encryption; the DB only owns table structure. Removes: `tests/docker-compose.proxy.yml`, the `proxy:up`/`proxy:logs`/ `proxy:down` mise tasks, the `mise run proxy:up` step from `test:sqlx`, and the `restart_proxy_and_wait` / `insert_through_proxy` / `PROXY_PORT` plumbing in `driver.rs`. The working-table `payload` column is now plain `jsonb` (no `eql_v2_encrypted` composite) and no `add_search_config` rows are written to `eql_v2_configuration` during generation. `mise run test:sqlx` now needs `CS_CLIENT_ACCESS_KEY` (or `CS_CLIENT_ID` + `CS_CLIENT_KEY`) and `CS_WORKSPACE_CRN` in the process env so `AutoStrategy::detect()` / `EnvKeyProvider` can pick them up.
`cipherstash::encrypt_store` now takes a slice and returns a `Vec`, so a fixture run issues one `encrypt_eql` call regardless of value count instead of one per value. The 14-row `eql_v2_int4` fixture drops from 14 ZeroKMS round trips to one; the planned bench fixture (~10k rows) gets the same treatment. Empty input short-circuits before `cipher()` so a caller with nothing to encrypt does not pay the bootstrap cost. `driver::insert_direct` now does one batched encrypt then a per-row INSERT loop — the INSERT is invariant across rows so the SQL string is lifted out of the loop. The working table is local Postgres and the per-row execute cost is in microseconds, so batching the INSERTs is not worth the dynamic-SQL complexity. Test coverage closes the gap flagged in the PR review: - Cheap unit tests (no live ZeroKMS): `index_type_for` mapping for unique/ore/match plus the unknown-name error path; an empty-batch short-circuit test that proves `cipher()` is not reached when there is nothing to encrypt (visible because the test runs without CS_* env vars). - Live tests gated by `fixture-gen` + `#[ignore]`: single-value round trip, batch length + per-payload identifier check, and a distinct-plaintexts → distinct `hm` assertion that mirrors the fixture-tests' equality term check at the unit-test layer.
`test:sqlx` copies `release/cipherstash-encrypt.sql` into `tests/sqlx/migrations/001_install_eql.sql` so the EQL extension is applied to each per-test database. Without an explicit build dep a stale release artifact silently ships an old EQL extension — visible when regression-guard migrations (e.g. 003_install_ste_vec_data.sql) fail on a `_encrypted_check_c` shape the current source has already fixed. Declaring `depends = ["build"]` makes the release artifact fresh on every direct invocation of `test:sqlx`; the top-level `test.sh` already builds, so CI behaviour is unchanged.
The eql_v2_int4 fixture generator encrypts via cipherstash-client, whose EnvKeyProvider requires CS_CLIENT_ID + CS_CLIENT_KEY to load the client key. test:sqlx regenerates fixtures every run, so CI needs that pair — but f8dfd92 only wired CS_CLIENT_ACCESS_KEY + CS_WORKSPACE_CRN (ZeroKMS auth), leaving the generator failing with "CS_CLIENT_ID environment variable not set". Add the client-key pair to the test job env. Also correct comments in mise.toml, tasks/fixtures.toml, Cargo.toml, and FIXTURE_SCHEMA.md that framed the two credential pairs as alternatives: auth (access key + workspace CRN) and key material (client id + key) are distinct roles and both are required.
feat(fixtures): plug-in framework + eql_v2_int4 reference fixture
feat: eql-types — canonical EQL v3 payload types (Rust)
…outed ordered domains
…riant + resolution backstop
…t ordered domains)
…enerated GitHub collapses these in PR diffs and excludes them from language stats. They are CI-verified machine-generated artifacts (parity gate, cargo-expand snapshots, matrix-coverage inventory), never hand-edited, so collapsing keeps reviews focused on source. Display hint only: files stay tracked and diffable, and nothing about Git, CI, or the build changes.
The aggregate-typecheck dispatch branched on the variant IDENT at macro-expansion time, with empty arms for Ord/OrdOre/Search meaning "emit no test" and a fallback arm emitting the 42883/42725 rejection assertion. Every new ord-capable variant had to remember to add an empty arm or the test silently did the wrong thing (asserting min/max are rejected on a variant that actually declares them). Replace the ident-literal dispatch with a single arm that emits one min + one max test per variant whose body branches at RUNTIME on spec.supports_ord() (catalog-derived): ord-capable -> assert eql_v3.min/max(value) RESOLVES; non-ord -> assert the call is rejected with SQLSTATE 42883/42725. A new ord-capable variant now needs no macro change. This emits aggregate_typecheck min/max tests for the ord/ord_ore (and text's search) variants that previously emitted none, so the committed matrix_tests.txt / matrix_tests_text.txt inventory snapshots and the int4 cargo-expand snapshot are regenerated (purely additive: no existing test name removed). eq-only snapshot unchanged (new names contain _ord, stripped by the derive filter; eq-only types have no ord variant).
…erals
The index-engagement / scale-preference matrix combos hardcoded the
functional-index extractor as a string literal ("eql_v3.eq_term" /
"eql_v3.ord_term") restated per combo — a fact the catalog already owns
via Term::extractor_for_operator. The [eq, ord, search] arm in particular
hand-split `=` -> eq_term and `<` -> ord_term into separate combos with
the extractor written out each time.
Replace the per-combo $extractor literal with a runtime lookup:
* index/scale combos now derive the extractor from the combo's ops via
a new scalar_domains::combo_extractor(spec, ops) helper, which returns
the single eql_v3-qualified extractor serving every op in the combo
and ERRORS if the ops would need two extractors (one functional index
cannot serve both) or if an op is unsupported.
* the scale-default combo derives its extractor from the `=`-serving
term (spec.extractor_for_op("=")) — ord_term for an [Ore] _ord domain,
eq_term for a [Hm, Ore] text _ord domain.
This deletes the redundant extractor literals from all three scalar_matrix!
arms while keeping the combo TUPLES (which encode the name-bearing dom_name
+ op structure). The emitted test-name set is unchanged — matrix_tests*.txt
inventory snapshots stay byte-identical and the int4 cargo-expand snapshot
diff is body-only (no rustc_test_marker changes).
The text arm's `=`/ord split into distinct _eqidx dom_names is RETAINED:
text `=` routes through eq_term and `<` through ord_term, so one index
cannot serve both — combo_extractor asserts exactly this, and three new
catalog-resolution unit tests pin it (no DB needed).
Steps 2-3 of the planned refactor (deriving `@>`/`<@` blocker/match
EMISSION and unifying the caps arms) were deliberately NOT done: which
blocker/match tests EXIST is name-bearing and fixed at macro-expansion
time, while catalog terms are a runtime concept, so driving emission from
the catalog would change the pinned test-name set. Removing the extractor
literals (step 1) is the de-duplication that restated catalog truth
without touching names.
…riant Add a direct unit test for the has_search_token catalog classifier (text true; int4/date false) rather than only exercising it via codegen output, and a text_value_tests assertion that MatchScalar's haystack/needle/disjoint plaintexts are each present verbatim in fixture_values() — the invariant fetch_fixture_payload relies on, so a fixture change that drops one fails here instead of at query time.
Every user-facing PR appends under [Unreleased], so concurrent branches conflict on the same region constantly. merge=union keeps both sides' added lines; order/dedupe is tidied by hand at release time.
…istry The eql-types crate (from the dan/eql-types-crate merge on eql_v3) was written when text ordered domains were [Ore]-only and there was no text_search domain. This branch routes text equality through hm ([Hm, Ore] for text_ord/_ord_ore) and adds text_search [Hm, Ore, Bloom], so the rebase left v3::all() out of sync with eql-scalars::CATALOG (catalog_parity's inventory_exactly_covers_catalog failed on the missing text_search domain). - Add TextSearch (hm + ob + bf) and register it in v3::all(). - Add hm to TextOrd/TextOrdOre so the Rust types match the SQL domain CHECK (text routes = / <> through hm, unlike the [Ore]-only int domains); a TextOrd without hm could not deserialize a real text_ord payload. - Update the v3_conformance round-trip sweep with text-specific ord/search wire builders and a text_search arm. - Refresh a stale matrix.rs comment referencing the removed Variant::required_term().
feat(eql_v3): text equality via hm + combined text_search domain
Stacks on the Rust-only eql-types crate. Every v3 domain type, the term newtypes, and Identifier gain a TS derive with bindings to crates/eql-types/bindings/v3/ (28 files, checked in). Term newtypes export as named TS aliases (export type Hmac256 = string) that every domain binding imports. mise types:generate clean-regenerates the bindings; types:check regenerates and fails on any diff or untracked output, wired into the rust-crates CI job as a freshness gate.
…te in-place regen - Drop the `serde-json-impl` ts-rs feature: no v3 type carries a serde_json::Value/Map/Number field, so it only pulled serde_json into ts-rs's build graph for nothing. Bindings regenerate byte-identically without it (verified via `mise run types:check`). - README: note that a plain `cargo test` / `mise run test:crates` regenerates `bindings/` in place as a side effect — only `types:generate` isolates the write via the temp-dir swap.
feat: eql-types — TypeScript bindings (stacked on #236)
eql-types was added to the workspace `members` in #236, but its Cargo.toml omitted the `[lints] workspace = true` opt-in that eql-scalars, eql-codegen, and eql-tests-macros all carry — so it was the one member not inheriting the workspace deny-lints (dead_code, unused_imports). Add it for parity. The crate already compiles clean under those lints (verified via `mise run test:crates`).
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the eql-types stack (#236 / #268 / #269).
eql-typeswas added to the workspacememberslist in #236, but itsCargo.tomlnever picked up the[lints] workspace = trueopt-in that every other member carries (eql-scalars,eql-codegen,eql-tests-macros). So it was the one workspace crate not inheriting the shared deny-lints:This adds the opt-in so the crate is fully integrated like its siblings. No code changes were needed —
eql-typesalready compiles clean under those lints (mise run test:crates: fmt + clippy-D warnings+ test all pass).Independent of #269 (which is still open); the
[lints]section and #269's dependency additions touch different parts of the file, so they merge cleanly in either order.