Eql v3 by tobyhede · Pull Request #246 · cipherstash/encrypt-query-language

tobyhede · 2026-05-29T07:08:45Z

THIS IS THE EQL_V3 STACK.
VERY WIP.

coderabbitai · 2026-05-29T07:08:54Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7507196-3891-4255-86f9-5f497df220c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch eql_v3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…, fix blocker doc path Hoist the src/v3 path strings into named constants in generate.rs (byte-identical output — parity holds). Remove the now-dead Term::returns() (codegen builds the extractor return type from CORE_SCHEMA + ctor) and its legacy eql_v2 test assertions. Repoint the shared blocker's doc comment to src/v3/scalars/<T>/.

The eql_v3 SEM index-term functions are a hand-port of the eql_v2 originals. The scalar matrix already exercises the array comparator's happy path end-to-end against real ciphertext fixtures, but several branches are structurally unreachable there and were tested only on the eql_v2 copies. Add a sibling family test module covering them: - differential v2<->v3 parity on real `ob` fixtures (both sides routed through extractor -> composite -> compare_..._terms, so the schema prefix is the only variable) — the strongest guard against a faithful-port slip; - the 'Ciphertexts are different lengths' RAISE; - NULL-term ordering branches the STRICT wrappers bypass; - array NULL + empty/cardinality recursion base cases; - has_* presence checks, the missing-`ob` RAISE, and the NULL-jsonb short-circuit. Verified non-vacuous: a deliberately broken comparator fails T1/T2/T3 while the independent T4/T5 stay green.

…entity names Addresses CodeRabbit review: the artifact/test self-containment assertions only matched schema-qualified eql_v2. references; extend them to also reject eql_v2_<entity> names (eql_v2_encrypted, eql_v2_configuration) while still allowing prose mentions of eql_v2 in doc comments. Verified the v3 artifact and src/v3 contain zero eql_v2_ occurrences.

…e SCHEMA With eql_v3 fully self-contained, the encrypted-domain families and the SEM index-term types they call live in one schema, so there is no second schema to point the core types at. Replace the CORE_SCHEMA/DOMAIN_SCHEMA pair with one SCHEMA = "eql_v3" constant; the templates read it via the {{ schema }} global. Output is byte-identical (golden parity holds).

…ardening - Cargo.toml: update stale fixture command to fixture:generate:all - eql-functions.md: index examples use distinct _eq/_ord columns (a column carries a single domain, so eq_term/ord_term apply to different columns) - scalars/mod.rs: header describes the scalar_types!(matrix_suites) layout - codegen-parity.sh: ls *.sql -> portable find (no set -e abort on empty dir) - inlinability.rs: narrow arity-1 hmac_256 match to the jsonb overload

@return

…ators The six comparison operator backing functions (_eq, _neq, _lt, _lte, _gt, _gte) were missing required @param and @return Doxygen tags, failing docs:validate:required-tags with 6 errors and 6 warnings. Tags follow the eql_v2 sibling src/ore_block_u64_8_256/operators.sql convention.

cargo fmt --check was failing CI (test:lint and test:crates jobs) on this file; apply rustfmt with no logic changes.

…t consts Make the SQLx scalar-matrix harness type-agnostic ahead of the first non-integer scalar, without adding one. Three integer assumptions are lifted: - `ScalarType::FIXTURE_VALUES` (a `const`) becomes `fn fixture_values()`, so a scalar whose values can't be const-constructed can return a borrow of a lazily-built `Vec` instead. Integer impls still hand back their `eql_scalars::<T>_VALUES` const. - New `min_pivot()` / `max_pivot()` trait methods replace the matrix's direct `<T>::MIN` / `<T>::MAX` pivot references, so a scalar without an inherent `::MIN`/`::MAX` const can supply an explicit sentinel. - The ORDER BY arms build their `WHERE` clause from `to_sql_literal(zero)` instead of a hardcoded `> 0`, so a non-integer plaintext column typechecks. Behaviour-preserving for the existing `int4` / `int2` types: the integer `min_pivot`/`max_pivot` resolve to `Self::MIN`/`Self::MAX`, `to_sql_literal(0)` renders `0`, and the generated test names are unchanged. The int4 cargo-expand snapshot is regenerated to track the method-based bodies.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

- scalar_domains.rs: document fixture_values() stable-order contract (callers compare element-wise and index positionally without sorting) - eql-tests-macros: assert emitted min_pivot/max_pivot bodies instead of loose MIN/MAX substrings that also match the doc comment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

feat(int8): add eql_v3.int8 encrypted-domain type family

…test hardening Collates and resolves code-review feedback on the self-contained eql_v3 PR. Inlinable SEM helpers (coderdan threads): - Convert eql_v3.jsonb_array_to_bytea_array and eql_v3.jsonb_array_to_ore_block_u64_8_256 from LANGUAGE plpgsql to inlinable LANGUAGE sql IMMUTABLE (no SET), using a CASE-scalar-subquery form so JSON-null/empty-array inputs still return NULL rather than raising (a naive FROM-SRF + WHERE rewrite would regress null to error). - These take a bare jsonb (not a domain), so pin_search_path.sql's structural skip does not cover them; opt them in via its documented 'eql-inline-critical' COMMENT marker so they install unpinned and the planner can inline them. Add matching splinter allowlist rows. The eql_v2 copies stay plpgsql by design. Direct SEM tests added (sem.rs). Stale references: - eql-functions.md / sql-support.md: v3 extractors now document eql_v3 SEM return types, not eql_v2. - pin_search_path.sql header, scalar_types.rs and mutations.rs comments. Test/build hardening: - writer.rs validates the AUTO-GENERATED ownership header before writing. - eql-scalars int_values! fails at compile time on narrowed-fixture overflow. - parity.rs asserts the generated file set, not just per-file contents. - mise.toml: shared test:sqlx:prep so test:sqlx:watch gets the same prep. - build.sh v3-only build rejects REQUIRE edges outside src/v3. - fixtures: single-source PAYLOAD_COLUMN const; enforce 63-byte identifier limit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Self-contained eql_v3 schema + standalone v3 installer

refactor(tests): generalise scalar matrix harness off integer-inherent consts

Add the `eql_v3.date` encrypted-domain scalar — the first non-integer ordered type — on top of the integer-agnostic harness refactor (parent PR). The SQL codegen needs no change (domains are jsonb-backed and token-driven); the work is one catalog row plus the temporal wiring the refactor enabled. Catalog (`eql-scalars`): `ScalarKind::Date` (`chrono::NaiveDate`), `Fixture::Date(&str)` (zero-dep ISO strings), `DATE_FIXTURES` (16 dates incl. the three matrix pivots), and `pub const DATE` appended to `CATALOG`, with mirrored panic / pivot-presence / token-order tests. Harness: an explicit `[temporal]` marker in the `scalar_types!` dispatch list drives the divergences from the integer path — the `impl ScalarType` for a temporal scalar is hand-written (chrono values can't be a `const` slice; pivots are explicit sentinels), and `scalar_fixture!` stamps a pivot-presence assert instead of the integer signed-extreme asserts. Adds `impl ScalarType for NaiveDate` (LazyLock-parsed values, `min_pivot`/`max_pivot` sentinels, quoted `to_sql_literal`), `EqlPlaintext for NaiveDate` (Cast::DATE), the sqlx `chrono` feature + direct `chrono` dep, the CHANGELOG entry, and a temporal-kinds note in the adding-a-scalar reference.

feat(scalars): add eql_v3.date encrypted-domain type + temporal wiring

… reference CATALOG now includes the non-integer ordered scalar (date) alongside the integers, so the 'only the integer scalars today' wording was stale. Addresses CodeRabbit feedback on PR #256.

ScalarKind loses the five partial accessors that panicked on non-integer kinds; bounds now live on the total BoundedIntKind, reached via ScalarKind::as_bounded_int(). Date::min_symbol() is now a compile error. A CATALOG invariant test replaces the deleted #[should_panic] tests.

Update the adding-a-scalar reference's `kind` bullet to reflect that the bounded-numeric accessors moved to the total BoundedIntKind sub-enum, and add the implementation plan under docs/superpowers/plans/.

refactor(eql-scalars): total BoundedIntKind sub-enum (replaces panicking ScalarKind accessors)

…e wiring

…p [temporal] marker

…ived)

…a is_eq_only_token The ordered_numeric_matrix! suite exercises </>/min/max; an equality-only scalar (no _ord domain in eql-scalars::CATALOG) does not support those. Route the matrix emitter through matrix_suite_for_entry, which reads eq-only-ness from the catalog (is_eq_only_token) and emits a compile_error! for an eq-only token instead of silently generating ordering tests it cannot pass. Makes the catalog-derived is_eq_only accessor load-bearing and leaves a clean seam for an equality-only matrix path. Both arms unit-tested.

Derived IEEE PartialEq (NaN != NaN, +0.0 == -0.0) was inconsistent with the manual Eq/Ord impls built on total_cmp (NaN == NaN, +0.0 != -0.0), breaking Eq's reflexivity contract and Ord/PartialEq equality agreement.

…eal ciphertext) Adds v3_jsonb_to_ste_vec_query_gin_is_cost_chosen: replicates one real v3_ste_vec document to 5000 rows + 1 distinct pivot, builds the to_ste_vec_query GIN index, and with enable_seqscan left ON asserts the planner CHOOSES the index for a single-row-selective $.hello oc containment needle (matched == 1). Complements the sibling *_gin_engages usability arm (which forces seqscan off). Verified PG17: passes.

…t:sqlx:prep) bench.sh hand-rolled build+cp+migrate but never generated the gitignored per-type fixtures. #[sqlx::test(fixtures(...))] include_str!'s those .sql files at COMPILE time, so once fixtures became generated/gitignored the bench binary stopped compiling: 'couldn't read tests/sqlx/fixtures/eql_v2_numeric.sql'. bench-eql has failed every nightly on main for 8+ days as a result — pre-existing, unrelated to the scale tests, but it blocks them from ever running. Reuse test:sqlx:prep (build + cp + migrate + fixture:generate:all) so bench stays in lockstep with test:sqlx.

With bench.sh now running test:sqlx:prep, fixture:generate:all encrypts via cipherstash-client and needs ZeroKMS auth (CS_CLIENT_ACCESS_KEY + CS_WORKSPACE_CRN) plus a client key (CS_CLIENT_ID + CS_CLIENT_KEY); without them it fails with 'Auth strategy error: Not authenticated'. Add the four secrets to the bench job env, mirroring test-eql.yml. (bench-eql triggers are push:main/schedule/dispatch, all main-repo, so no fork-PR creds exposure.)

feat(v3): float4/float8 encrypted-domain types

Job-scoped secrets are exposed to every step, including third-party actions (actions/checkout, jdx/mise-action, Swatinem/rust-cache) referenced by mutable tags — a compromised tag could exfiltrate ZeroKMS/client creds before the bench script runs. Move the four CS_* vars onto the 'Run bench tests' step that actually needs them (fixture:generate:all). Least privilege; addresses PR review.

Adds the eql_v3 encrypted-domain property suite over the committed, curated real-ciphertext fixtures: - shared all-pairs operator oracle (property.rs): = / <> on _eq and the ordered comparisons + ord_term sort order, checked against a plaintext oracle over every ordered pair of fixture rows; - function-double oracles: the generated eql_v3.eq/neq/lt/lte/gt/gte functions across all three overloads (domain-domain, domain-jsonb, jsonb-domain) plus term-extractor identity (eq_term==hm, ord_term==ob); - bloom match smoke for the text _match domain; NULL/blocker/CHECK edge cases; - the e2e suite (gated behind proptest-e2e) over fresh ZeroKMS encryption. Fixtures are generated from the curated catalog values via FixtureSpec::run().

…P-3141) Adds a dedicated test:sqlx:e2e mise task and CI job for the proptest-e2e suite (needs ZeroKMS creds; the credential-free shards run the fixture suite), and runs source doc validation unconditionally.

Documents the three property-test suites (catalog / fixture / e2e) over the committed curated fixtures, the function-double oracles, and term-extractor identity. CHANGELOG entry under [Unreleased].

test(v3): scaled, cost-chosen index-engagement tests for encrypted-domain surface

…le (CIP-3141) The float4/float8 shared-index-term e2e test pulled both `hm` and `ob` via `as_str()`, but the ORE `ob` term is a JSON array of block strings, not a scalar string (only `hm` is a string). `as_str()` returned None on the array, raising "payload missing string `ob`". Split the helper: `hm` stays a string, `ob` extracts the array and compares directly, matching the canonical extractor in property.rs. Latent until the gated e2e suite started running in CI.

…or (CIP-3141) `float4_and_float8_share_index_terms_for_the_same_value` asserted byte-equality of the raw `ob` ORE arrays of two independently-encrypted payloads. That can never hold: a BlockORE term is `Left (deterministic) ++ Right (16-byte random per-ciphertext nonce + nonce-masked truth tables)`, so two encodings of the SAME value — same width, same cast — are byte-UNEQUAL by construction. Ordering is decided by the ORE compare function, not raw bytes. (The cast is irrelevant: `real`/`double` collapse to one f64 `ColumnType::Float` in cipherstash-client; the deterministic Left halves are byte-identical, which is what proves the two widths share an encoding.) The bug stayed latent because the e2e suite is feature/creds-gated and, when it did run, an earlier `ob`-as-string extraction errored out before the assertion; fixing that extraction (fe52d42) unmasked the wrong assertion. Compare the extracted `ord_term`s through the SQL `eql_v3.ore_block_256` `=` operator (the only correct ORE check) and keep the deterministic `hm` equality term as a direct byte comparison. Also correct the CHANGELOG claim of a "byte-identical ORE term" to "equal under the ORE comparator". Verified: the test now passes against fresh ZeroKMS encryption.

…t_expand [codex] assert scalar scale predicates are selective

…, and STRICT NULL (G3 4a)

test(sqlx): eql_v3 SQL-function property tests (CIP-3141)

…ice)

…e:all

…res (hm + ORE)

docs-static (source-only SQL doxygen coverage + required-tags) ran on every PR, unlike the other heavy jobs. Its inputs are a strict subset of the changes-job relevant filter — src/**, the crates/** codegen build it depends on, and the tasks/docs/** validator scripts — so a PR that leaves relevant false cannot change its outcome. Gate it on the same flag as the other jobs: consistent, and drops a redundant codegen build on markdown-only PRs without losing coverage. A narrower src/**-only filter was rejected — it would risk a silent false-green (ci-required counts skipped as pass) by skipping on a real input change. Sync .github/workflows/README.md (coverage map, known gaps, recently- closed, operator-setup verification) and the stale validate-job comment.

Engineered fixture pair (NEEDLE's ngrams ⊆ HAY's ngram set, yet NEEDLE is not a contiguous substring of HAY) plus a creds-free catalog contiguity guard that pins the plaintext invariant. Strengthen the divergence test with a raw-bf subset assertion: assert needle-bf ⊆ haystack-bf directly on the stored `bf` arrays via native jsonb containment, independent of the `eql_v3` domain `@>` operator and the `match_term` extractor. Previously the subset was only inferred from the domain `@>` it was meant to validate (circular); the direct check localizes a future tokenizer change to a precise 'bf arrays no longer a subset' failure instead of an opaque @>-false.

The cross-ciphertext oracle include_str!s eql_v2_<token>_doubles.sql at compile time, but stub-fixtures.sh only stubbed eql_v2_<token>.sql and the literal .gitignore *.sql entries — neither matches the _doubles glob. So the no-creds matrix-coverage / inventory jobs could not compile the encrypted_domain binary to --list it (rustc: couldn't read eql_v2_int4_doubles.sql, ...). Stub eql_v2_<token>_doubles.sql alongside eql_v2_<token>.sql for every catalog token, consistent with the helper's stub-the-complete-set policy (harmless extras for storage-only tokens with no real doubles fixture).

…ment

test(v3): text-search direct-function coverage + bloom-vs-LIKE divergence lock-in

test(v3): cross-ciphertext equality via per-type doubles fixtures (CIP-3141)

tobyhede and others added 29 commits June 3, 2026 18:59

style(test): rustfmt encrypted_domain/family/sem.rs

00761d0

cargo fmt --check was failing CI (test:lint and test:crates jobs) on this file; apply rustfmt with no logic changes.

chore: quote echo var in tasks/build.sh

212769c

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Merge pull request #253 from cipherstash/v3-domain-type-int8

b2ef727

feat(int8): add eql_v3.int8 encrypted-domain type family

Merge pull request #255 from cipherstash/v3-self-contained-schema

c665c4f

Self-contained eql_v3 schema + standalone v3 installer

Merge pull request #258 from cipherstash/v3-scalar-harness-generalize

e273a25

refactor(tests): generalise scalar matrix harness off integer-inherent consts

Merge pull request #256 from cipherstash/v3-domain-type-date

420c794

feat(scalars): add eql_v3.date encrypted-domain type + temporal wiring

docs: correct stale 'out of scope' scalar coverage in adding-a-scalar…

8a92ce2

… reference CATALOG now includes the non-integer ordered scalar (date) alongside the integers, so the 'only the integer scalars today' wording was stale. Addresses CodeRabbit feedback on PR #256.

feat(eql-scalars): add total BoundedIntKind sub-enum

9dca530

docs: describe BoundedIntKind instead of panicking accessors

32be5ce

Update the adding-a-scalar reference's `kind` bullet to reflect that the bounded-numeric accessors moved to the total BoundedIntKind sub-enum, and add the implementation plan under docs/superpowers/plans/.

Merge pull request #259 from cipherstash/v3-bounded-int-kind

f10794d

refactor(eql-scalars): total BoundedIntKind sub-enum (replaces panicking ScalarKind accessors)

test(scalars): add temporal_values! macro for chrono-backed ScalarTyp…

24d12ae

…e wiring

test(scalars): generate date ScalarType via temporal_values!

6d71700

feat(eql-scalars): add is_temporal()/is_eq_only() capability accessors

8eeb1ca

build(eql-tests-macros): depend on eql-scalars catalog

a7fe635

refactor(eql-tests-macros): derive temporal/eq_only from catalog, dro…

c15bef1

…p [temporal] marker

test(scalars): drop [temporal] marker from dispatch list (catalog-der…

d3df023

…ived)

style(eql-tests-macros): apply rustfmt to scalar matrix emitters

e297363

tobyhede added 28 commits June 19, 2026 22:04

Merge pull request #299 from cipherstash/eql_v3_float_domains

2798810

feat(v3): float4/float8 encrypted-domain types

ci: gate e2e property suite + unconditional source doc validation (CI…

b1bb302

…P-3141) Adds a dedicated test:sqlx:e2e mise task and CI job for the proptest-e2e suite (needs ZeroKMS creds; the credential-free shards run the fixture suite), and runs source doc validation unconditionally.

docs: property-test suite docs + changelog (CIP-3141)

7625a1b

Documents the three property-test suites (catalog / fixture / e2e) over the committed curated fixtures, the function-double oracles, and term-extractor identity. CHANGELOG entry under [Unreleased].

Merge pull request #301 from cipherstash/eql_v3_scale_index_engagement

917b925

test(v3): scaled, cost-chosen index-engagement tests for encrypted-domain surface

test(v3): assert scalar scale predicates are selective

9023e8c

Merge pull request #305 from cipherstash/eql_v3_scale_index_engagemen…

1cfff1a

…t_expand [codex] assert scalar scale predicates are selective

test(v3): cover eql_v3.contains/contained_by by name, mixed overloads…

0ad1061

…, and STRICT NULL (G3 4a)

Merge pull request #293 from cipherstash/v3-property-tests-fn-doubles

3433f6a

test(sqlx): eql_v3 SQL-function property tests (CIP-3141)

test(fixtures): add per-type doubles fixtures (plaintext encrypted tw…

77266af

…ice)

test(fixtures): generate per-type doubles fixtures in fixture:generat…

7a92e2f

…e:all

test(property): add doubles-fixture loader to the fixture oracle

af28c0b

test(property): cross-ciphertext equality over per-type doubles fixtu…

33b42fe

…res (hm + ORE)

docs: describe doubles fixtures + cross-ciphertext test (hm + ORE)

96ee902

style: rustfmt eql_doubles DOUBLES_TOKENS

8bcc802

style(v3): rustfmt text-match tests; drop stale plan reference in com…

8a6e821

…ment

Merge pull request #306 from cipherstash/eql-v3-text-search-docs-tests

fce499e

test(v3): text-search direct-function coverage + bloom-vs-LIKE divergence lock-in

Merge pull request #303 from cipherstash/v3-property-cross-ciphertext

c98a8cd

test(v3): cross-ciphertext equality via per-type doubles fixtures (CIP-3141)

tobyhede closed this Jun 20, 2026

tobyhede force-pushed the eql_v3 branch from e9dd533 to c98a8cd Compare June 20, 2026 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eql v3#246

Eql v3#246
tobyhede wants to merge 1374 commits into
mainfrom
eql_v3

tobyhede commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tobyhede commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 29, 2026 •

edited

Loading