diff --git a/docs/development/2026-06-24-eql-v3-pr-loc-analysis.md b/docs/development/2026-06-24-eql-v3-pr-loc-analysis.md new file mode 100644 index 00000000..c876b23f --- /dev/null +++ b/docs/development/2026-06-24-eql-v3-pr-loc-analysis.md @@ -0,0 +1,209 @@ +# EQL v3 PR — LOC Analysis: Generated vs. Base Implementation + +**Date:** 2026-06-24 +**Branch:** `eql_v3` vs `main` +**Merge base:** `80a7a2bc21a04ed7af5916d252605d576dcbc21a` +**Method:** `git diff --numstat HEAD`, cross-checked by four parallel +analysis agents (one per independent domain) reading file contents and the +generation/verification mechanisms. + +--- + +## TL;DR + +The PR reports as **~100k LOC** (498 files, **102,511 insertions / 32,897 +deletions**), but that number is dominated by machine-produced artifacts: + +- **~59,142 lines (≈58% of insertions) are committed generated/snapshot + artifacts** — `cargo expand` macro snapshots and golden codegen reference SQL. + No human authored them; they are deterministically regenerable from a small + Rust catalog and CI-gated to byte-for-byte parity. +- A **further ~26,810 lines of generated scalar SQL** is materialized to disk by + the codegen but **gitignored**, so it is absent from the diff entirely. +- The **genuine hand-authored base implementation is ~11–13k LOC** (~8k Rust + catalog/codegen/macros/types + ~3.3k hand-written SQL + bespoke test suites). +- The leverage point for *all* of the generated output (committed golden refs, + gitignored SQL, and the expanded test matrix) is the **~5k-LOC `eql-scalars` + catalog + `eql-codegen` renderers**. + +**Review implication:** the generated buckets do not need line-by-line human +review — the parity/inventory tests verify them. Reviewer attention belongs on +the ~5k catalog + codegen and the bespoke JSONB/SEM/ORE test suites. + +--- + +## Top-level breakdown (insertions) + +| Path | Added | Deleted | Class | +|---|---:|---:|---| +| `tests/sqlx/snapshots/` | 32,196 | 0 | **Generated** (macro-expand snapshot + matrix baselines) | +| `tests/codegen/reference/` | 26,946 | 0 | **Generated** (golden codegen reference SQL) | +| `tests/sqlx/src/` + `tests/sqlx/tests/` | ~17,213 | 0 | Hand-written test harness + suites | +| `crates/` | 12,383 | 0 | Hand-written Rust (+ JSON/TS fixtures) | +| `Cargo.lock` | 5,590 | 0 | Generated lockfile | +| `src/` | 3,336 | 7,505 | Hand-written SQL (v3) / legacy removal | +| `docs/` | 1,644 | 5,365 | Docs | +| `tasks/` | 902 | 442 | Build/CI tooling | +| `.github/` | 808 | 74 | CI workflows | +| `mise.toml` | 606 | 5 | Task config | +| Other (`DEVELOPMENT.md`, `SUPABASE.md`, `CLAUDE.md`, `README`, …) | ~700 | ~500 | Docs/meta | + +**Generated/snapshot subtotal in the committed diff: ≈ 59,142 (58% of insertions).** + +--- + +## 1. Generated artifacts (committed, not authored) + +### 1a. `tests/sqlx/snapshots/` — 32,196 LOC + +| File | LOC | What it is | +|---|---:|---| +| `int4_expanded.rs` | **31,260** | `cargo expand` macro-expansion snapshot of the `int4` matrix suite (220 `#[rustc_test_marker]` blocks). Pure rustc test-harness boilerplate. | +| `matrix_tests_text.txt` | 306 | token-normalized matrix baseline (text shape) | +| `matrix_tests.txt` | 220 | token-normalized matrix baseline (canonical) | +| `README.md` | 206 | docs (only authored file in the dir) | +| `v3_jsonb_tests.txt` | 76 | pinned test-name set (jsonb) | +| `matrix_jsonb_entry_tests.txt` | 55 | pinned test-name set (jsonb SteVec entry) | +| `matrix_tests_eq_only.txt` | 54 | derived matrix baseline (eq-only) | +| `matrix_tests_storage_only.txt` | 19 | matrix baseline (storage-only) | + +- **Provenance:** `int4_expanded.rs` line 1 — `` `eql_v3_int4` matrix suite — generated by `scalar_types!` ``. The macro source is `tests/sqlx/src/matrix.rs`. +- **Regeneration:** `mise run test:matrix:expand` (`cargo +nightly-2026-05-01 expand --test encrypted_domain scalars::int4`), with pinned nightly + `cargo-expand 1.0.122` so the snapshot only moves when the macro moves. +- **Verification:** `.github/workflows/macro-expand-eql.yml` regenerates and runs `git diff --exit-code` (non-blocking drift backstop). The `.txt` baselines are gated by `mise run test:matrix:inventory` in the `matrix-coverage` CI job. +- **Note:** the `.rs` lives under `snapshots/` (not `tests/`) so Cargo never compiles it as a test target. + +**Verdict:** generated snapshot. High confidence. + +### 1b. `tests/codegen/reference/` — 26,946 LOC, 108 files + +| Type | Files | | Type | Files | +|---|---:|---|---|---:| +| `text` | 16 | | `int8` | 11 | +| `date` | 11 | | `numeric` | 11 | +| `float4` | 11 | | `timestamptz` | 11 | +| `float8` | 11 | | `bool` | 3 | +| `int2` | 11 | | `README.md` | 1 | +| `int4` | 11 | | | | + +- **Provenance:** every file carries `-- REFERENCE: hand-maintained parity baseline for crates/eql-codegen` followed by `-- AUTOMATICALLY GENERATED FILE.` +- **Generation:** `cargo run -p eql-codegen` renders the SQL from `eql_scalars::CATALOG` + minijinja templates; the body is copied verbatim into the reference tree with a one-line provenance header. +- **Verification (three layers):** + - `tasks/codegen-parity.sh` — strips the provenance line and `diff`s byte-for-byte against generated output; also asserts the reference dir set equals the catalog token set. + - `crates/eql-codegen/tests/parity.rs` — `reference_dirs_match_catalog_tokens`, `rust_generator_matches_reference_files`, `generate_all_is_deterministic_across_runs`. + - In-crate reference tests in `crates/eql-codegen/src/generate.rs`. + +**Verdict:** generated golden files, deterministically reproducible. High confidence (95%+). + +### 1c. Gitignored generated SQL — 26,810 LOC (NOT in the diff) + +The codegen materializes the scalar SQL surface into `src/v3/scalars//` +(`*_types.sql` / `*_functions.sql` / `*_operators.sql` / `*_aggregates.sql`), +all excluded by `.gitignore` (lines 234–240). `git ls-files src/v3/scalars` +tracks exactly one hand-written file (`functions.sql`). This is ~27k lines of +real generated code that never appears in the LOC count. + +--- + +## 2. Hand-written base implementation + +### 2a. `crates/` — 12,383 LOC + +| Crate | Added | Breakdown | Role | +|---|---:|---|---| +| **eql-scalars** | 2,446 | 2,425 `.rs`, 21 `.toml` | **The catalog — source of truth.** `CATALOG` of `ScalarSpec` rows; `Term` capabilities (`Hm`=eq, `Ore`=eq+ord) fixed in impls. Includes ~1,237 LOC unit tests + proptest invariants. Std-only, zero-dep. | +| **eql-codegen** | 2,532 | 2,401 `.rs`, 108 `.j2`, 23 `.toml` | **The renderers — codegen.** Reads `CATALOG`, renders SQL into `src/v3/scalars//`. Key files: `generate.rs` (683), `operator_surface.rs` (619), `context.rs` (380), `writer.rs` (272). Binary exposes `list-types` / `dump-catalog`. | +| **eql-tests-macros** | 774 | 757 `.rs`, 17 `.toml` | **Test-wiring proc-macros.** Expands one `scalar_types!` list into per-type SQLx-matrix wiring across the three test compilation contexts. | +| **eql-types** | 6,631 | 3,189 `.json`, 2,229 `.rs`, 1,097 `.ts`, 95 `.md`, 19 `.toml`, 2 `.gitignore` | **Mixed — mostly data.** v3 type models + conformance/catalog-parity tests in Rust; the bulk is committed JSON schema fixtures (`schema/v3/*.json`) and TS, not authored logic. | + +**Genuine hand-authored Rust:** ~7,812 LOC (2,425 + 2,401 + 757 + 2,229), a +large share of which is tests. ~3,189 LOC is JSON schema data; ~1,097 is TS. + +### 2b. `src/v3/` — 3,377 added (legacy: −7,505) + +| Subdir | Added | Content | +|---|---:|---| +| `src/v3/jsonb/` | 1,780 | jsonb SteVec surface (types, functions, operators, aggregates, blockers, test) | +| `src/v3/sem/` | 1,018 | Hand-written SEM index-term types: `hmac_256`, `ore_block_256`, `ore_cllw`, `bloom_filter` | +| `src/v3/lint/` | 355 | `lints.sql` structural lint rules | +| `src/v3/` (root) | 164 | forked `crypto.sql` / `common.sql`, `schema.sql`, `version.template` | +| `src/v3/scalars/` | 60 | `functions.sql` — the sole committed scalar SQL (shared blocker) | + +The **−7,505 deletions** are the old `eql_v2` surface removed in 3.0.0. The new +v3 implementation is entirely additive (`src/v3/`: +3,324 / −0 of genuinely-new +files; the subdir table sums to 3,377 because it counts the full body of +`crypto.sql`, which git records as a rename of `src/crypto.sql` contributing only ++12 net to the top-level `src/` total of 3,336). + +### 2c. `tests/sqlx/src/` + `tests/sqlx/tests/` — ~17,213 LOC + +**`tests/sqlx/src/` (9,137)** — reusable harness, near-zero per-type cost: + +| File | LOC | Character | +|---|---:|---| +| `matrix.rs` | 3,572 | **Macro engine.** ~40 chained `macro_rules!` that fan out the cartesian product (category × domain × operator × pivot) + EXPLAIN-plan helpers. Emits the bulk of the suite's *expanded* test count from near-zero source. | +| `scalar_domains.rs` | 1,754 | Declarative per-type trait wiring (`ScalarType`/`OrderedScalar`/…), materialized via local macros (`int_values!`, `temporal_values!`). | +| `fixtures/` subtree | 2,948 | Real-ciphertext fixture generation: `driver.rs` (548), `eql_plaintext.rs` (509), `spec.rs` (413), `cipherstash.rs` (412, ZeroKMS path), `scalar_fixture.rs` (283), validation. | +| `property.rs` | 519 | **All-pairs oracle engine** — `assert_eq_oracle`/`assert_ord_oracle` over every ordered pair, function-double + extractor oracles, proptest bridging. | + +**`tests/sqlx/tests/` (8,076)** — mostly bespoke assertions: + +| Subtree | LOC | Character | +|---|---:|---| +| `encrypted_domain/` | 4,420 | `family/` structural SQL-catalog suites (sem, mutations, inlinability, support) + `property/` oracle drivers (thin row-sourcing over the shared engine) | +| `tests/` (root) | 3,656 | `v3_jsonb_tests.rs` (1,590 — 33 hand-authored SteVec/JSONB tests that can't fit the scalar matrix), `v3_jsonb_operator_surface_tests.rs` (474), `ore_block_comparator_tests.rs` (474), `ore_cllw_v3_opclass_tests.rs` (466), `text/text_match.rs` (398) | + +**Key structural finding:** per-type wiring is **~10 lines total** — +`scalar_types.rs` lists `int4 => i32, int2 => i16, …` and `scalars/mod.rs` is a +single `scalar_types!(matrix_suites);` invocation. The large *expanded* test +count massively overstates hand-authored effort; the irreducible bespoke logic +is the JSONB/SteVec suite (~2,500 lines) plus the SEM/ORE/family structural checks. + +--- + +## 3. Synthesis + +| Class | LOC | Share of insertions | +|---|---:|---:| +| Committed generated/snapshot artifacts | ~59,142 | ~58% | +| Generated lockfile (`Cargo.lock`) | 5,590 | ~5% | +| Committed JSON/TS schema data (`eql-types`) | ~4,286 | ~4% | +| **Hand-authored base implementation** | **~11–13k** | **~11–13%** | +| Docs / tooling / CI | ~4,500 | ~4% | +| Other test wiring/harness (counted above in 13k) | — | — | + +**Outside the diff:** ~26,810 lines of gitignored generated scalar SQL. + +### Bottom line + +The PR's apparent size is dominated by deterministically-regenerable, CI-gated +artifacts — not new logic. The **true hand-written base implementation is on the +order of 11–13k LOC**, and *all* generated output (the 59k committed + 27k +gitignored + the expanded test matrix) is single-sourced from the **~5k-LOC +`eql-scalars` catalog + `eql-codegen` renderers**. Adding a new scalar type is +one catalog row plus ~one line of test wiring. + +--- + +## Reproduce + +```bash +base=$(git merge-base HEAD origin/main) # 80a7a2bc... + +# Total +git diff --stat $base HEAD | tail -1 + +# By top-level path +git diff --numstat $base HEAD \ + | awk '{split($3,a,"/"); add[a[1]]+=$1; del[a[1]]+=$2} + END{for(t in add) printf "%10d + %8d - %s\n", add[t], del[t], t}' \ + | sort -rn + +# Generated buckets +git diff --numstat $base HEAD -- tests/sqlx/snapshots tests/codegen/reference \ + | awk '{a+=$1} END{print a}' + +# Gitignored generated SQL on disk (after a build) +find src/v3/scalars -name '*.sql' \ + | grep -E '_(types|functions|operators|aggregates).sql$' \ + | xargs wc -l | tail -1 +``` diff --git a/docs/development/eql-v2-vs-v3-comparison.md b/docs/development/eql-v2-vs-v3-comparison.md new file mode 100644 index 00000000..a07784bd --- /dev/null +++ b/docs/development/eql-v2-vs-v3-comparison.md @@ -0,0 +1,254 @@ +# `eql_v2` ↔ `eql_v3` Comparison & Gap Audit + +> **Status:** Comparative audit, captured 2026-06-22; corrections re-verified 2026-06-24. Companion to the v3-only deep dive in `eql-v3-implementation-audit.md`. +> **Method:** Read-only, four parallel sub-audits (v2 capability inventory · test coverage · architecture · direct verification). Every claim cited `file:line`. +> **Scope:** `eql_v2` = all of `src/` **except** `src/v3/` (the documented, unchanged public API). `eql_v3` = `src/v3/` (new encrypted-domain surface). +> +> **`eql_v2` has since been removed from the tree (3.0.0).** Its `file:line` citations are no longer resolvable against the working tree — verify them against git history: the v2 **SQL surface** lives at `462b020c^` (parent of the removal commit "feat!: remove eql_v2 SQL surface"), and the v2 **test files** at `40d35e8c^` (they were deleted in a separate commit). e.g. `git show 462b020c^:src/operators/=.sql`. `eql_v3` citations resolve against the current tree. + +--- + +## 0. Why `eql_v3` is a step-change — test coverage + +**`eql_v3` is tested like a cryptographic library; `eql_v2` is tested like a feature.** v2 verifies a handful of hand-picked examples against ciphertext that was encrypted once and checked into the repo. v3 proves *properties* hold across the whole value space, against real ciphertext regenerated on every run, with the test surface itself pinned so coverage can't silently erode. + +| Test capability | `eql_v2` | `eql_v3` | Impact | +|---|---|---|---| +| **Verification model** | example-based — assert specific expected counts on hand-picked rows | **property-based all-pairs oracle** — every ordered pair `(a,b)` checked against a plaintext oracle for `=` `<>` `<` `<=` `>` `>=` | Bugs hide between hand-picked examples; an oracle over the full pairing doesn't let them. | +| **Ciphertext source** | committed **static** blobs, pinned to one keyset (`tests/ore.sql`, `ste_vec.sql`) | **regenerated every prep** from the catalog via `cipherstash-client`; e2e re-encrypts live via ZeroKMS | v2 can pass against ciphertext that no longer matches current crypto; v3 can't. | +| **No-DB invariants** | none | pure-Rust **catalog proptests** (blocker non-STRICT+plpgsql, payload-keys==terms, ordering monotonic) | Safety invariants checked in milliseconds, on every PR incl. forks, before a DB even spins up. | +| **Determinism** | none | **byte-for-byte codegen goldens** — identical catalog ⇒ identical SQL | The generated surface is reproducible and reviewable, not trust-me output. | +| **Coverage erosion guard** | none — a deleted/`#[cfg]`-gated test vanishes silently | **snapshot-pinned test-name set**, cross-checked against the catalog's `list-types` | You cannot lose a test (or forget to wire a new type's matrix) without CI going red. | +| **Test authoring cost** | one hand-written test per case | one `scalar_matrix!{}` per type auto-emits ~N arms; one catalog row adds a whole type's suite | Coverage scales with the catalog, not with engineer keystrokes. | + +**By the numbers** + +| | `eql_v2` | `eql_v3` | +|---|---|---| +| Verification style | ~407 hand-written `#[sqlx::test]` cases (38 files, no macro expansion) | **2,547 runnable tests** — 2,085 generated matrix arms (10 types) + all-pairs oracle + property/edge suites | +| Distinct test tiers | 1 (SQLx examples) | **6** (catalog proptest · fixture oracle · cross-ciphertext · edge-cases · e2e · matrix) | +| Ciphertext freshness | frozen at commit | regenerated per prep + **fresh per e2e run** | +| CI gates with no v2 analog | — | **6** (`codegen:parity` · `self_contained_v3` · `clean_install_v3` · `matrix:inventory` · `matrix:catalog-coverage` · `e2e`) | +| Runs on fork PRs (no creds/DB) | partial | catalog proptests + codegen goldens + inventory — **full safety gate** | + +> **Bottom line:** v3 closes the three ways an encryption test suite silently rots — stale ciphertext, an example that never covered the real edge, and a test that quietly stopped running. v2 is exposed to all three; v3 is gated against each. + +--- + +## 1. TL;DR + +`eql_v3` is **not** a rename of `eql_v2` — it is an additive, self-contained schema that re-architects the **scalar** surface around per-capability PostgreSQL domains generated from a Rust catalog. It is **strictly safer** (fail-closed equality, type-encoded capability, no ciphertext-order escape hatch) and **far better tested** (property oracle + generated matrix + real-ciphertext fixtures + codegen goldens + snapshot pinning). It is **not yet a functional superset**: it lacks database-side config/encryptindex and on-column opclass indexing. Two apparent "gaps" are deliberate security postures, not omissions: text match/search uses bloom containment instead of `LIKE`/`ILIKE`, and `bool` is store/decrypt-only (any boolean index term is a 2-value plaintext leak). + +| | `eql_v2` | `eql_v3` | +|---|---|---| +| Type model | one runtime-typed composite `eql_v2_encrypted` | per-capability jsonb-backed **domains** per scalar | +| SQL origin | 100% hand-written | **generated** from `eql-scalars::CATALOG` | +| Crypto/index types | shared in `eql_v2` schema | **owns its own** SEM types, zero `eql_v2` dep | +| Equality when term missing | **fail-OPEN** (returns `NULL`) | **fail-CLOSED** (RAISE / type rejects) | +| Test style | example-based, **committed static** ciphertext | property oracle + matrix + **regenerated** ciphertext | +| Determinism gate | none | byte-for-byte codegen goldens | + +--- + +## 2. Capability matrix + +Legend: ✅ full · ◑ partial / different model · ❌ absent + +| Capability | `eql_v2` | `eql_v3` | Notes | +|---|---|---|---| +| Equality `=` `<>` (HMAC `hm`) | ✅ `src/operators/=.sql:66` | ✅ per-type `_eq` domain | v3 also routes text eq through `hm`, never lossy ORE | +| Ordering `<` `<=` `>` `>=` (Block-ORE `ob`) | ✅ `src/operators/<.sql:78` | ✅ `_ord`/`_ord_ore` domains | v3 comparator is `IMMUTABLE`, block-count derived from length | +| `min` / `max` aggregates | ✅ generic `encrypted/aggregates.sql:18` | ◑ per-`_ord` type + SteVec entries | no single generic-encrypted aggregate in v3 | +| JSONB containment `@>` `<@` (SteVec) | ✅ `src/operators/@>.sql:31` | ✅ `src/v3/jsonb/operators.sql:139,214` | **shipped & in v3 build** (`deps-ordered-v3.txt`) | +| JSONB path `->` `->>` | ✅ `src/operators/->.sql:58` | ◑ `src/v3/jsonb/operators.sql:47,99` | typed `(json,text)`/`(json,int)` overloads only; **bare untyped literal RHS falls through to native — see §2a** | +| SteVec entry ordering (CLLW-ORE `oc`) | ✅ `src/operators/ste_vec_entry.sql:95` | ✅ `src/v3/jsonb/operators.sql:314` | `ore_cllw` SEM type used here, not orphaned | +| Native-jsonb op blockers (`?` `@?` `#>` `-` `||` …) | n/a | ✅ `src/v3/jsonb/blockers.sql:40-284` | fail-closed for mutate/predicate ops (**not** `->`/`->>`, §2a) | +| **Text match/search** | ✅ `LIKE` `~~` / `ILIKE` `~~*` (`src/operators/~~.sql:90,118`) | ◑ **different model**: `@>`/`<@` bloom containment | deliberate divergence: probabilistic ngram containment is not SQL wildcard/anchoring pattern matching | +| **`grouped_value(jsonb)`** aggregate (GROUP BY recipe) | ✅ `src/encrypted/functions.sql:97`, doc `json-support.md:203` | ❌ absent | v3 ships only typed `min`/`max`; documented grouping recipe has no v3 path | +| `eql_v2_encrypted` composite + `add_encrypted_constraint` | ✅ one untyped composite + helper that `ALTER TABLE … CHECK`s a plain column (`src/encrypted/casts.sql:14`, `constraints.sql`, `functions.sql:122`) | ◑ **different model — no 1:1 path** | v3 replaces the single composite with per-capability domains whose inline `CHECK` validates *more* strictly at cast/insert (committed golden `tests/codegen/reference/int4/int4_types.sql:30-38`; rendered from template `crates/eql-codegen/templates/types.sql.j2:16-25`); the `to_encrypted` cast + `add_encrypted_constraint` helper have no direct equivalent. Deliberate redesign, not drop-in | +| User-facing v3 JSON docs | ✅ `docs/reference/json-support.md` (all `eql_v2.*`) | ❌ thin | shipped v3 SteVec/JSONB has **no** user doc; caveats live only in SQL `@warning` comments | +| **Config management** (`eql_v2_configuration`, add/modify/remove search config, state machine) | ✅ `src/config/` (6 files) | ❌ absent | **decided: not ported DB-side** — kept permanently client-side (Protect-style); [#312](https://github.com/cipherstash/encrypt-query-language/issues/312) closed *not planned* | +| **Encryptindex migration** (create/rename cols, diff/activate) | ✅ `src/encryptindex/functions.sql` | ❌ absent | **decided: not ported DB-side** — client-side alongside config; [#312](https://github.com/cipherstash/encrypt-query-language/issues/312) closed *not planned* | +| `bool` query operators | ✅ equality via generic encrypted | ◑ **store/decrypt-only by design** | single `eql_v3.bool` domain, **all** operators (incl. `=`) are blockers; 2-value cardinality makes any index term a plaintext leak (`lib.rs:457`) — deliberate, not a parity gap | +| **blake3** index | ◑ legacy `b3`→`hm` | ❌ absent | already legacy in v2 | +| On-**column** btree/hash opclass (transparent `ORDER BY`/`GROUP BY`/`DISTINCT`/hash-join) | ✅ `src/operators/operator_class.sql:69`, `src/operators/hash_operator_class.sql:25` | ❌ forbidden on domains | v3 indexes via functional index on extractor (footgun: opclass-on-domain breaks blockers) | +| Total-order-over-ciphertext fallback | ◑ btree FUNCTION 1 raw-text fallback `src/operators/operator_class.sql:101` | ❌ none (intentional) | v2 fallback is a documented edge-case leakage shape; v3 has no escape hatch | + +### 2a. Caveat: `->`/`->>` are *not* fail-closed against untyped literals + +v3's native-jsonb blockers are genuinely fail-closed for **mutate/predicate** operators (`?` `?|` `?&` `@?` `@@` `#>` `#>>` `-` `#-` `||`): each binds the exact native RHS regtype with `LEFTARG = eql_v3.json` and `RAISE`s (`src/v3/jsonb/blockers.sql:40-284`). **But the two supported extraction operators `->`/`->>` are not.** v3 defines only `(eql_v3.json, text)` and `(eql_v3.json, integer)` overloads (`src/v3/jsonb/operators.sql:33-121`) with no `jsonb`-LHS override, so a **bare untyped literal** RHS routes to the native operator: + +```sql +col -> 'sel' -- ⚠ native jsonb->text: root-key lookup on the envelope, silently returns NULL +col -> 'sel'::text -- ✅ v3 eql_v3."->" (operators.sql:33) +col -> $1 -- ✅ v3 operator (typed param — the Proxy path) +``` + +`eql_v3.json` is a domain over `jsonb` (binary-coercible to its base), so the native base-type operator wins the exact-match tiebreak over the domain-typed v3 operator when the RHS is `unknown`. The failure is a **silent wrong answer / NULL (false-negative), not a plaintext leak**, and is documented in-source (`operators.sql:20-28` `@warning`; avoided in tests via explicit `::text` casts — `src/v3/jsonb/jsonb_test.sql:52` (`->`) and `:56` (`->>`)). The mitigation holds only because the CipherStash Proxy always sends typed `$n` parameters; any **direct-SQL** caller writing `col -> 'sel'` gets native semantics with no error. Suggested closure: add a `jsonb`-LHS `->`/`->>` blocker pair, or a test exercising the bare-literal path. + +--- + +## 3. Architecture deltas + +```mermaid +flowchart LR + subgraph v2 ["eql_v2 — runtime-typed"] + E2["eql_v2_encrypted
ONE composite (jsonb)"] -->|runtime payload inspection| OPS2["hand-written operators
src/operators/*.sql"] + OPS2 --> TERMS2["hm / ob / oc / bf / sv
(eql_v2 schema)"] + end + subgraph v3 ["eql_v3 — type-encoded capability"] + CAT["eql-scalars::CATALOG"] -->|eql-codegen| DOM["per-capability DOMAINs
<T> / _eq / _ord / _ord_ore / _match / _search"] + DOM -->|extractor + wrapper or BLOCKER| TERMS3["hmac_256 / ore_block_256 / ore_cllw / bloom_filter
(eql_v3 schema — self-contained)"] + end +``` + +| Dimension | `eql_v2` | `eql_v3` | Verdict | +|---|---|---|---| +| **Capability location** | runtime payload key check | **type-system** (distinct domain per capability) | v3 — provisioning is a compile/plan-time fact | +| **Equality fail mode** | NULL when `hm` absent → row silently excluded (`src/operators/=.sql:68-74` STRICT body, `src/hmac_256/functions.sql:28`) | domain `CHECK (VALUE ? 'hm')` rejects un-provisioned value at cast/insert (committed golden `tests/codegen/reference/int4/int4_types.sql:36`; emitted by template `crates/eql-codegen/templates/types.sql.j2:18-19` from `Term::Hm => "hm"` at `crates/eql-scalars/src/term.rs:12`) | **v3 — most security-meaningful win** | +| **Ordering fail mode** | RAISE when `ob` absent (`ore_block_u64_8_256/functions.sql:49`) | RAISE when `ob` absent | parity | +| **Adding a type** | crypto-layer change; *no* per-type SQL | one `ScalarSpec` row → full surface regenerated | trade-off: v2 simpler, v3 safer/explicit | +| **Determinism** | none | byte-exact goldens `eql-codegen/tests/parity.rs:96` | v3 | +| **Self-containment** | none | zero `eql_v2.*`, build+CI enforced (`build.sh:59`, `self_contained_v3.sh:17`) | v3 — installs without `eql_v2` | +| **Inline/pin upkeep** | per-function **allowlist** (`tasks/pin_search_path.sql:99-295`) | **structural rule** (`tasks/pin_search_path_v3.sql:72-87`) | v3 — rule beats list | +| **Blocker discipline** | n/a (no blocker concept) | `LANGUAGE plpgsql` + non-`STRICT`, tested invariant | v3 | +| **Installed surface size** | one type + few operator files | (types × capabilities) domains + ~20-op blocker surface each | v2 simpler to read directly | + +**v3 improvements (ranked):** ① fail-closed equality ② capability encoded in the type ③ catalog-generated + byte-exact determinism ④ structural self-containment ⑤ structural pin rule ⑥ no ciphertext-order escape hatch. + +**v3 regressions / gaps vs v2:** ① per-domain combinatorial explosion (larger installed surface & dep graph) ② adding a type is heavier conceptually (catalog row + regen + fixtures + snapshots + goldens) ③ functional parity not reached — no config/encryptindex, no on-column opclass ④ load-bearing `<= 16` malformed-term guard in the shared ORE comparator (`src/v3/sem/ore_block_256/functions.sql:163`) is a subtle correctness surface v2 lacked (v2's comparator hardcoded an 8-block width — `src/ore_block_u64_8_256/functions.sql` — so it never derived block count from length and never needed the guard). Not listed here, because they are deliberate security postures rather than regressions: `LIKE`/`ILIKE` (v3 text match/search exposes bloom containment instead) and `bool` query operators (store/decrypt-only — any boolean index term is a 2-value plaintext leak). + +--- + +## 4. Test coverage — the headline improvement + +### 4.0 Test inventory — hard numbers + +Counts re-verified 2026-06-24 against the current tree (`tests/sqlx/`, `crates/eql-scalars`, `crates/eql-codegen`, `tests/codegen/reference`) — i.e. **after** `eql_v2` and its tests were removed. (The original capture's `62` files / `617` `#[sqlx::test]` / `165` `#[test]` were the *combined* v2+v3 tree before v2-test removal in `40d35e8c`; the v3-only figures below replace them.) + +Two counting conventions appear below. **Harness-listed** is the authoritative count: what `cargo test -- --list` enumerates after the test binaries compile — i.e. *post* macro expansion, so it includes every `scalar_matrix!{}`/`jsonb_matrix!{}` arm. It requires a build but **no database** (listing does not connect). **Source-grep** counts are literal `grep`/`find` over the committed `.rs` sources (reproducible on a fork, no build at all); they count the hand-written attrs *before* the matrix macros multiply them, so they are far smaller. The gap between the two is exactly the macro multiplication. + +| Metric | Count | Source (counting method) | +|---|---:|---| +| **Total runnable tests** (the real figure) | **2,547** | `cargo test -- --list` (compiled + expanded, no DB) | +| …of which in the `encrypted_domain` binary (v3 matrix + property suites) | **2,256** | per-binary summary from the same `--list` | +| Rust test files (`tests/sqlx/tests/**/*.rs`) | **30** (53 across all `tests/sqlx`) | `find tests/sqlx/tests -name '*.rs' \| wc -l` | +| `#[sqlx::test]` attrs — source grep (pre-expansion) | **234** | `grep -rE '#\[sqlx::test' tests/sqlx --include='*.rs'` | +| `#[test]` attrs (non-DB) — source grep (pre-expansion) | **153** | `grep -rE '#\[test\]' tests/sqlx --include='*.rs'` | +| …of which under `encrypted_domain/` (the v3 suites), source grep | **101** `#[sqlx::test]` + **5** `#[test]` | `grep -rE … tests/sqlx/tests/encrypted_domain/` | +| **Generated scalar-matrix test names** (pinned in snapshots) | **2,085** | see expansion below (these are the macro-expanded arms, not source grep) | +| Generated JSONB-entry matrix names | **55** | `matrix_jsonb_entry_tests.txt` | +| v3 JSONB top-level pinned names | **76** | `v3_jsonb_tests.txt` (was 74 at capture; [#318](https://github.com/cipherstash/encrypt-query-language/pull/318) added 2) | +| Catalog property tests (no DB): 1 `proptest!` group + 6 `#[test]` | **7** | `proptest_invariants.rs` | +| Codegen parity tests | **4** | `eql-codegen/tests/` | +| Codegen golden reference files (10 token dirs) | **107** | `tests/codegen/reference/` | + +**Generated matrix expansion** — snapshots are ``-templated; the macro emits one test per name × the types of that shape: + +| Shape | Names/type | Types | Total | Snapshot | +|---|---:|---:|---:|---| +| Ordered scalar (`int2/4/8`, `float4/8`, `numeric`, `date`, `timestamptz`) | 220 | 8 | **1,760** | `matrix_tests.txt` | +| Text (eq + ord + match/search) | 306 | 1 | **306** | `matrix_tests_text.txt` | +| Storage-only (`bool`) | 19 | 1 | **19** | `matrix_tests_storage_only.txt` | +| **Total generated scalar-matrix tests** | | **10** | **2,085** | | + +> `matrix_tests_eq_only.txt` (54) is a derived/hypothetical shape no live type currently uses; it pins the eq-only emission path but isn't multiplied into the total. v2 has **0** generated tests — its ~407 `#[sqlx::test]` cases (counted at `40d35e8c^`) are all hand-written. + +### 4.1 v2 vs v3 at a glance + +| | `eql_v2` | `eql_v3` | +|---|---:|---:| +| Generated tests | 0 | **2,085** scalar + 55 entry | +| Distinct test tiers | 1 | **6** | +| No-DB safety tests (run on fork PRs) | 0 | 7 catalog + 4 codegen parity | +| Codegen golden files | 0 | 107 | +| Ciphertext | frozen at commit | regenerated per prep + fresh per e2e run | + +```mermaid +flowchart TD + subgraph V2 ["eql_v2 — example-based"] + SF["committed STATIC ciphertext
ore.sql 1000r · ore_text.sql 100r · ste_vec.sql 10r"] --> EX["hand-written #[sqlx::test]
expected counts (~407 cases)"] + end + subgraph V3 ["eql_v3 — property + generated"] + C["CATALOG"] --> FG["generate_all_fixtures
(real ciphertext, REGENERATED)"] + C --> MX["scalar_matrix! → ~N arms/type"] + FG --> ORA["all-pairs oracle vs plaintext"] + C --> INV["catalog proptests (no DB)"] + C --> GOLD["codegen goldens (byte-exact)"] + MX --> SNAP["matrix snapshot pinning"] + end +``` + +| Dimension | `eql_v2` | `eql_v3` | Better? | +|---|---|---|---| +| Methodology | example-based, hand-picked counts (`comparison_tests.rs:48`) | property all-pairs oracle (`src/property.rs:84`) + generated matrix (`matrix.rs:174`) | ✅ generalizes over value space | +| Fixtures | **committed static** blobs, keyset-pinned (`tests/ore.sql:1`, `ORE_FIXTURES.md:67`) | **gitignored, regenerated** each prep via `cipherstash-client` (`generate_all_fixtures.rs`) | ✅ no stale keysets | +| Equal-plaintext / distinct-ciphertext | implicit | explicit `_doubles` fixtures → `cross_ciphertext.rs` | ✅ | +| No-DB invariants | none | pure-Rust proptests: blocker non-STRICT+plpgsql, payload-keys==terms (`proptest_invariants.rs`) | ✅ net-new | +| Codegen determinism | n/a | byte-for-byte goldens (`tests/codegen/reference//`) | ✅ net-new | +| Test-name coverage pinning | none — deleted test vanishes silently | snapshot inventory cross-checked vs `list-types` (`snapshots/`) | ✅ net-new | +| Live crypto-path regression | frozen blobs only | e2e re-encrypts every run via ZeroKMS (`e2e_oracle.rs`, `proptest-e2e`) | ✅ net-new | + +### v3 test suites + +| Suite | Location | Covers | DB | Creds | +|---|---|---|---|---| +| catalog (proptest) | `crates/eql-scalars/src/proptest_invariants.rs` | term/op/extractor consistency, payload-keys==terms, int-range ordering | ❌ | ❌ | +| fixture oracle | `…/property/fixture_oracle.rs` | all-pairs eq/ord + function-double + extractor identity over committed ciphertext | ✅ | ❌ | +| cross_ciphertext | `…/property/cross_ciphertext.rs` | equal plaintext / distinct ciphertext compare equal (hm + ORE) | ✅ | ❌ | +| match_smoke | `…/property/match_smoke.rs` | text bloom `@>`/`<@` containment | ✅ | ❌ | +| edge_cases | `…/property/edge_cases.rs` | NULL propagation, blockers raise, CHECK rejects, every blocker non-STRICT+plpgsql | ✅ | ❌ | +| e2e oracle | `…/property/e2e_oracle.rs` | same oracle over **fresh ZeroKMS** encryption | ✅ | **✅** | +| scalar matrix | `src/matrix.rs` | per-(category,domain,op,pivot) arms: sanity, pivots, NULL, blockers, index-engages, aggregates, order_by | ✅ | ❌ | +| family | `…/family/{inlinability,mutations,support,sem,jsonb_operator_surface}.rs` | inlining, negative-control mutations, SEM types, jsonb op surface | ✅ | ❌ | +| codegen parity | `eql-codegen/tests/parity.rs` | byte-exact vs goldens; determinism; reference dirs == catalog tokens | ❌ | ❌ | +| matrix inventory | `tests/sqlx/snapshots/*.txt` | pins test-name set; discovered types == `list-types` | ❌ | ❌ | + +### CI gates (v3-only, no v2 analog) +`codegen:parity` · `self_contained_v3` + `clean_install_v3` · `matrix:inventory` + `matrix:catalog-coverage` · `rust-crates` (catalog proptest) · `e2e` (fresh ZeroKMS). v2 rides only the shared sharded SQLx run (PG 14–17) with **no** coverage-pinning, codegen, self-containment, or property gate. + +### Test gaps where v3 < v2 (feature-driven) +1. **SteVec/JSONB queries** — capability shipped, but coverage rides a **committed, hand-written** fixture (`v3_ste_vec.sql`, `v3_doc_int4.sql`) pending a SteVec-document generator; v2 has a fuller query test set (`jsonb_tests.rs`, `jsonb_path_operators_tests.rs`, containment-uses-index tests). +2. **Text match/search semantics** — v2 `like_operator_tests.rs` covers SQL pattern operators; v3 `match_smoke.rs` covers deliberate bloom-containment `@>`/`<@` semantics, with no `~~`/`~~*` surface planned unless semantics change. +3. **ORE text-at-scale** — v2 `ore_text.sql` (100 lexically sorted words) + `ore_text_order_tests.rs`; v3 text ordering exercises curated catalog pivots only. +4. **Config / encryptindex** — v2 `config_tests.rs` (16), `encryptindex_tests.rs` (7); no v3 analog (intentional scope). +5. **Operator-class indexes** — v2 `operator_class_tests.rs`, `ore_cllw_opclass_tests.rs`; v3 uses functional indexes by design, no opclass mirror. + +> Note: `ore_block_comparator_tests.rs` is v2-named but already loads **v3** fixtures (`eql_v3_numeric`/`eql_v3_timestamptz`) — partial migration of v2 ORE-comparator coverage onto the v3 pipeline. + +--- + +## 5. Net assessment + +| Question | Answer | +|---|---| +| Is v3 safer than v2? | **Yes** — fail-closed equality + type-encoded capability + no ciphertext-order fallback. | +| Is v3 better tested than v2? | **Yes, decisively** — property oracle, generated matrix, regenerated real ciphertext, codegen goldens, snapshot pinning, self-containment + e2e gates. None exist for v2. | +| Is v3 a functional superset of v2? | **Not yet.** Real gaps: config/encryptindex, on-column opclass indexing, `grouped_value` GROUP BY recipe, user-facing JSON docs. SteVec/JSONB **is** shipped (contrary to the stale note in the v3-only audit), but its tests are still hand-written and do not yet have the scalar suite's property/e2e oracle strength. Two divergences are deliberate security postures, not gaps: text match/search (bloom containment, not `LIKE`/`ILIKE`) and `bool` (store/decrypt-only). | +| Biggest risks to track | (1) `->`/`->>` silently fall through to native `jsonb->text` on untyped literals (§2a) — direct-SQL callers get NULL, not an error. (2) Shared ORE comparator's `<= 16` malformed-term guard (`src/v3/sem/ore_block_256/functions.sql:163`) — load-bearing, warrants targeted coverage. | + +### 5a. Disposition of every v2→v3 difference + +So each difference carries an explicit decision, not an implied "TODO": + +| v2 → v3 difference | Disposition | Status | +|---|---|---| +| Config management (`src/config/*`, 6 files) + encryptindex migration (`src/encryptindex/functions.sql`) | **Decided: not ported DB-side** — kept permanently client-side (Protect-style) | [#312](https://github.com/cipherstash/encrypt-query-language/issues/312) closed *not planned* | +| `eql_v2_encrypted` composite + `add_encrypted_constraint` | **Different model, no 1:1 path** — replaced by per-capability domains with stricter inline `CHECK`; deliberate redesign | Decided (by design) | +| `LIKE` `~~` / `ILIKE` `~~*` (bloom `bf` / match) | **Different model, by design** — v3 *does* ship a bloom match surface (`eql_v3.text_match` / `text_search`, `@>`/`<@` containment, `match_term`→`bloom_filter`); only the `~~`/`~~*` operator spelling is v2-only. Probabilistic ngram containment ≠ SQL wildcard/anchoring | Decided (by design) | +| `bool` query operators | **By design** — store/decrypt-only; any boolean index term is a 2-value plaintext leak | Decided (by design) | +| `->`/`->>` untyped-literal fallthrough (§2a) | **Inherent to the `jsonb`-domain type-kind & accepted** — mitigated by typed Proxy params | Decided (documented) | +| On-column btree/hash opclass | **By design** — opclass-on-domain breaks blockers; index via functional index on extractor | Decided (by design) | +| `grouped_value` GROUP BY recipe · user-facing v3 JSON docs | **True gaps to close** — tractable follow-ups, no decision blocking them | Open (no issue yet) | +| blake3 | **N/A** — already legacy in v2 | Decided (dropped) | +| Uninstall drops schema only | **Correct by design** (no config table) | Decided | + +### Top gaps to close for v2 parity (priority order) +1. Document/migration guidance for text match/search: v2 `LIKE`/`ILIKE` wildcard patterns do not map to v3 probabilistic ngram bloom containment (`@>`/`<@`); adding `~~`/`~~*` is not planned unless those semantics change. +2. SteVec-document **fixture generator** → retire committed `v3_ste_vec.sql` exception; widen JSONB query tests. +3. Document that encrypted `bool` is store/decrypt-only in v3 (all query operators are blockers by design — any boolean index term is a 2-value plaintext leak). Not a fix; a docs note. +4. Decide config/encryptindex: port DB-side, or document as permanently client-side (Protect-style). +5. Close the `->`/`->>` untyped-literal hole (§2a): add a `jsonb`-LHS blocker pair, or a regression test for the bare-literal path. +6. Port the `grouped_value` GROUP BY recipe, and write user-facing v3 JSON docs (current `json-support.md` is 100% `eql_v2.*`). + +> **Correction to `eql-v3-implementation-audit.md` §3.1 / §1 scope note:** v3 SteVec/JSONB is **implemented and in the v3 build** (`src/v3/jsonb/{types,functions,operators,aggregates,blockers}.sql`, 5 entries in `src/deps-ordered-v3.txt`), not "a separate design not yet implemented." Only its *fixture generation* is outstanding. diff --git a/docs/development/eql-v3-implementation-audit.md b/docs/development/eql-v3-implementation-audit.md new file mode 100644 index 00000000..3206d2a6 --- /dev/null +++ b/docs/development/eql-v3-implementation-audit.md @@ -0,0 +1,485 @@ +# `eql_v3` Implementation Audit + +> **Status:** Reference snapshot of the current `eql_v3` encrypted-domain implementation, captured 2026-06-22, re-verified 2026-06-23 against the 3.0.0 build/schema overhaul (single self-contained installer; `eql_v2` removed), and re-verified 2026-06-24 to incorporate the empty-ORE-term domain CHECK (#262/#316) and the `->`/`->>` bare-literal domain-flattening footgun (#318). +> **Method:** Read-only audit across four domains — Rust crates, generated/hand-written SQL, test infrastructure, and the build system. Every claim is cited as `file_path:line`. + +## 1. Executive summary + +`eql_v3` is **the** PostgreSQL schema EQL ships — a self-contained surface for searchable encryption over scalar types. As of 3.0.0 the earlier `eql_v2` schema (composite `eql_v2_encrypted` column type, database-side config management, operator-class-on-column indexing) has been **removed**: it is no longer built or shipped, surviving only in fork-provenance comments under `src/v3/` and in historical records (`CHANGELOG.md`, the v2.x upgrade guides). `eql_v3` is jsonb-backed **PostgreSQL domains** — one domain per operator/index capability — and owns its own copies of the SEM (searchable-encrypted-metadata) index-term types, so it installs into a database with **no other EQL schema present** and has **zero dependency on `eql_v2`**. + +The entire SQL surface is **generated from a single Rust catalog**. The data flow is strictly one-directional: + +```mermaid +flowchart LR + CAT["eql-scalars::CATALOG
(source of truth)"] --> CG["eql-codegen
(renderers)"] + CG --> SQL["src/v3/scalars/<T>/*.sql
(generated, gitignored)"] + SQL --> BUILD["build.sh
tsort + concat"] + BUILD --> REL["release/cipherstash-encrypt*.sql"] + CAT --> FIX["fixture generation
(real ciphertext)"] + CAT --> TEST["SQLx matrix + property tests"] + FIX --> TEST +``` + +Three properties define the design and are enforced mechanically (not by convention): + +1. **Self-containment** — no `eql_v2.` appears in executable `src/v3/` SQL; enforced at build time (`verify_v3_self_contained`) and in CI (`test:self_contained_v3`). +2. **Determinism** — identical `CATALOG` produces byte-identical SQL; enforced by codegen parity goldens. +3. **Fail-closed capability** — a value cannot be queried without provisioning the matching index term; storage-only domains block *every* operator including `=`. + +### Scope note + +The `int4` family is the reference implementation, but the live surface is wider: **8 ordered scalar families** (`int2`, `int4`, `int8`, `float4`, `float8`, `numeric`, `date`, `timestamptz`), plus `bool` (storage-only, all operators blocked), plus `text` (adds match/search via bloom filter), plus a separate `jsonb`/SteVec design. The SEM layer carries **four** index-term types (`hmac_256`, `ore_block_256`, `bloom_filter`, `ore_cllw`), not two. + +--- + +## 2. Crate structure + +The workspace has five members (`Cargo.toml:21-30`). The core production path is catalog -> SQL renderer -> generated SQL; the adjacent type and test crates keep the wire format and SQLx coverage aligned with that same catalog. + +| Crate | Role | Main dependencies | +|---|---|---| +| `crates/eql-scalars` | The **catalog** — source of truth for scalar types, domains, terms, fixtures | Zero runtime deps (`crates/eql-scalars/Cargo.toml:7-10`); `proptest` dev-only | +| `crates/eql-codegen` | The **renderer** — turns each `CATALOG` row into SQL | `eql-scalars`, `minijinja`, `serde`, `serde_json`, `thiserror` (`crates/eql-codegen/Cargo.toml:7-12`) | +| `crates/eql-types` | Canonical Rust/TS/JSON Schema payload structs for each `eql_v3` SQL domain | `serde`, `ts-rs`, `schemars`; dev parity against `eql-scalars` (`crates/eql-types/Cargo.toml:7-17`) | +| `crates/eql-tests-macros` | Proc-macro fan-out for the SQLx scalar matrix from one scalar list | `syn`, `quote`, `proc-macro2`, `eql-scalars` (`crates/eql-tests-macros/Cargo.toml:11-14`) | +| `tests/sqlx` | Integration-test harness and fixture/oracle runner | `sqlx`, `tokio`, `cipherstash-client`, `eql-scalars`, `eql-tests-macros`, `proptest` (`tests/sqlx/Cargo.toml:7-39`) | + +### 2.1 `eql-scalars` — the catalog + +Definitions live in `lib.rs`; inherent `impl`s split into `kind`/`term`/`fixture`/`spec` modules. + +| Item | Kind | Location | Role | +|---|---|---|---| +| `ScalarSpec` | struct | `lib.rs:210-216` | One scalar type: `token`, `kind`, `domains`, `fixtures` | +| `DomainSpec` | struct | `lib.rs:201-205` | One generated domain: `suffix` + fixed `terms` (`""` = storage-only) | +| `ScalarKind` | enum | `lib.rs:52-93` | Native scalar a domain maps onto (`I16`…`Jsonb`) | +| `BoundedIntKind` | enum | `lib.rs:36-41` | Total accessor for the 3 integer kinds; makes non-int bounded access a compile-time impossibility | +| `Term` | enum | `lib.rs:112-117` | `Hm` (equality), `Ore` (eq + ordering), `Bloom` (containment/match) | +| `Role` | enum | `lib.rs:125-160` | Whole-domain file role with `rank()` precedence `Ord > Eq > Match > Storage` | +| `Fixture` | enum | `lib.rs:170-197` | Value-kind-tagged plaintext (`Min`/`Max`/`Zero`/`Int`/typed variants) | +| `CATALOG` | const | `lib.rs:568-579` | Ordered source-of-truth slice (10 types) | +| `ENVELOPE_KEYS` | const | `lib.rs:104` | `["v","i","c"]` — always-present payload CHECK keys | + +**Capability model** is fixed in `Term` `impl` methods (`term.rs:10-73`), not in catalog data: + +| Term | `json_key` | extractor | constructor | operators | role | ordering? | +|---|---|---|---|---|---|---| +| `Hm` | `"hm"` | `eq_term` | `hmac_256` | `=`, `<>` | `Eq` | no | +| `Ore` | `"ob"` | `ord_term` | `ore_block_256` | `=`, `<>`, `<`, `<=`, `>`, `>=` | `Ord` | yes | +| `Bloom` | `"bf"` | `match_term` | `bloom_filter` | `@>`, `<@` | `Match` | no | + +**Domain suffix → term mapping:** + +| Suffix | Terms (int) | Terms (text) | Capability | +|---|---|---|---| +| `""` (storage) | `[]` | `[]` | storage-only — all ops blocked | +| `_eq` | `[Hm]` | `[Hm]` | equality | +| `_ord` / `_ord_ore` | `[Ore]` | `[Hm, Ore]` | order + eq | +| `_match` | — | `[Bloom]` | LIKE / containment | +| `_search` | — | `[Hm, Ore, Bloom]` | all three | + +> **Design note:** integer `_ord` domains are `[Ore]`-only (ORE equality is lossless for ints), but `text` ordered domains lead with `Hm` so `=`/`<>` route through exact `hm`, never lossy ORE (`lib.rs:415-450`). `bool` is storage-only (single term-less domain) to avoid low-cardinality leakage. +> +> **Non-empty-array terms:** beyond `json_key`, a term may declare that its payload key must hold a *non-empty array* via `Term::nonempty_array_key` (`term.rs:82-87`). Only `Ore` opts in: the empty-ORE payload `ob: []` (produced solely by encrypting `''` into an ordered column) must be rejected at the domain CHECK rather than mis-ordered (issue #262/#316). `Term::nonempty_array_keys` (`term.rs:116-118`) is the domain-level rollup the renderer consumes — symmetric to `term_json_keys`, so the CHECK never hardcodes a single key. + +**Fixture materialization** (`fixture.rs:13-40`, `lib.rs:591-660`): `Fixture::numeric_value(kind)` is a `const fn` resolving int fixtures to `i128`. The `int_values!` / `text_values!` macros const-evaluate these into `&'static` slices (`INT4_VALUES`, `TEXT_VALUES`, …) with compile-time bounds re-checks. Non-const kinds (`date`/`numeric`/`float`/`bool`/`timestamptz`) expose their `ScalarSpec` as `pub` and the SQLx harness parses `.fixtures` at runtime. This replaces the old committed generated `_values.rs` — no Rust-source round-trip. + +```mermaid +classDiagram + class ScalarSpec { + +token + +kind: ScalarKind + +domains: &[DomainSpec] + +fixtures: &[Fixture] + } + class DomainSpec { + +suffix + +terms: &[Term] + } + class Term { + <> Hm | Ore | Bloom + +json_key() extractor() ctor() + +operators() role() provides_ordering() + } + class ScalarKind { + <> I16 I32 I64 Numeric Text Jsonb Date Timestamptz Bool F32 F64 + } + class Fixture { + <> Min Max Zero Int Text ... + +numeric_value(kind) + } + ScalarSpec --> ScalarKind + ScalarSpec --> "*" DomainSpec : domains + ScalarSpec --> "*" Fixture : fixtures + DomainSpec --> "*" Term : terms (fixed) + Term ..> Role : role() + Fixture ..> ScalarKind : numeric_value(kind) +``` + +### 2.2 `eql-codegen` — the renderer + +Templates are `include_str!`-embedded (no runtime template IO). Binary `eql-codegen` + lib `eql_codegen`. + +| Module | Key items | Role | +|---|---|---| +| `main.rs` | arg dispatch (`main.rs:6-46`) | no args → `generate_all`; `list-types`; `dump-catalog` | +| `generate.rs` | `render_{types,functions,operators,aggregates}_file`, `generate_type`, `generate_all` | per-file renderers + orchestrator | +| `context.rs` | minijinja `environment()`, serde contexts, `FnEntry`/`DomainBlock`/`OpEntry` | template contexts + relocated logic | +| `operator_surface.rs` | `Operator`, `OPERATORS` (20), `is_native_jsonb_blocker` | full 20-operator surface + Postgres signatures | +| `writer.rs` | ownership-guarded write (`writer.rs:63-106`) | refuses to clobber files lacking the generated-marker header | +| `tests/parity.rs` | byte-for-byte gate vs `tests/codegen/reference//` | determinism + golden enforcement | + +**Binary entrypoints:** + +- `cargo run -p eql-codegen` (no args) → `generate_all` regenerates every type's gitignored SQL. This is what `mise run build` invokes (`main.rs:30-40`). +- `cargo run -p eql-codegen -- list-types` → one token per line; consumed by matrix-inventory (`main.rs:11-16`). +- `cargo run -p eql-codegen -- dump-catalog` → JSON of `(type → domain → ops)`; consumed by catalog-coverage (`main.rs:21-28`). + +**Per-type emission** (`generate.rs:206-268`): for each `ScalarSpec`, writes `_types.sql` (one `CREATE DOMAIN … AS jsonb` per domain), then per domain `_functions.sql` + `_operators.sql`, and `_aggregates.sql` only when ord-capable. `render_functions_file` emits one extractor per distinct extractor-term, then iterates all 20 operators × signatures emitting either an inlinable `LANGUAGE sql` **wrapper** (if supported) or a `LANGUAGE plpgsql` **blocker** (if not). The native-jsonb blocker set is **derived by exclusion** (`operator_surface.rs:208-216`) so adding an operator auto-classifies it. + +```mermaid +flowchart TD + CAT["CATALOG: &[ScalarSpec]"] -->|iterate| GA["generate_all"] + GA --> GT["generate_type(spec, out_dir)"] + GT --> RT["render_types_file → T_types.sql"] + GT --> RF["render_functions_file → T_functions.sql
(extractors + wrappers + blockers)"] + GT --> RO["render_operators_file → T_operators.sql"] + GT --> RA["render_aggregates_file → T_aggregates.sql
(ord-capable only)"] + OPS["OPERATORS (20) + signatures"] --> RF & RO + TERM["Term capabilities"] --> RF + RT & RF & RO & RA --> WR["ownership-guarded writer"] +``` + +### 2.3 `eql-types` — canonical v3 payloads + +`eql-types` models the JSON payload shape that flows into the `jsonb` domains. It is deliberately domain-explicit: there is one Rust struct per SQL domain, and `v3::all()` enumerates those structs in `eql-scalars::CATALOG` order (`crates/eql-types/src/v3/mod.rs:135-180`). Catalog parity tests fail if that list drifts from the generated SQL domain inventory. + +Each payload always carries the envelope keys `v`, `i`, and `c`; capability-specific structs add only the terms required by that SQL domain (`crates/eql-types/src/v3/mod.rs:19-35`). `SchemaVersion` is a private `u16` newtype whose only constructible/deserializable value is `2`, with JSON Schema emitted as `const: 2` (`crates/eql-types/src/lib.rs:29-100`). `Identifier` is the shared `{ "t": "...", "c": "..." }` table/column value (`crates/eql-types/src/lib.rs:103-114`). There is no `Option`-based "maybe term" struct and no discriminated enum: many domains are wire-identical across tokens, and `_ord` versus `_ord_ore` is intentionally indistinguishable on the wire (`crates/eql-types/src/v3/mod.rs:37-45`). + +| Type family | Struct shape | SQL/domain relationship | +|---|---|---| +| Storage | `{ v, i, c }` | storage-only domains such as `eql_v3.int4` | +| Equality | `{ v, i, c, hm }` | `_eq` domains; `hm: Hmac256` | +| Order | `{ v, i, c, ob }` | `_ord` / `_ord_ore` domains; `ob: OreBlock256` | +| Match | `{ v, i, c, bf }` | text `_match`; `bf: BloomFilter` | +| Search | `{ v, i, c, hm, ob, bf }` | text `_search` | + +```mermaid +classDiagram + class DomainType { + <> + +sql_domain_static() + +sql_domain() + +domain() + +schema_id() + +schema() + } + class Int4 { + +v: SchemaVersion + +i: Identifier + +c: Ciphertext + } + class Int4Eq { + +v + +i + +c + +hm: Hmac256 + } + class Int4Ord { + +v + +i + +c + +ob: OreBlock256 + } + class Terms { + Ciphertext + Hmac256 + OreBlock256 + BloomFilter + } + DomainType <|.. Int4 + DomainType <|.. Int4Eq + DomainType <|.. Int4Ord + Int4 --> Terms + Int4Eq --> Terms + Int4Ord --> Terms +``` + +Reusable term newtypes live in `v3::terms`: `Ciphertext(String)` for `c`, `Hmac256(String)` for `hm`, `OreBlock256(Vec)` for `ob`, and `BloomFilter(Vec)` for `bf` (`crates/eql-types/src/v3/terms.rs:18-49`). `BloomFilter` has a manual JSON Schema implementation so its elements validate as PostgreSQL `smallint` range values, not merely annotated `int16` values (`crates/eql-types/src/v3/terms.rs:51-98`). + +### 2.4 Test support crates + +`eql-tests-macros` keeps the SQLx matrix from forking into several hand-maintained lists. Its input is a `token => rust_type` list, while capability shape is read back from `eql-scalars::CATALOG` at macro-expansion time (`crates/eql-tests-macros/src/lib.rs:37-66`). The macros classify temporal, integer, text, numeric, float, storage-only, equality-only, and search-capable tokens from catalog methods (`crates/eql-tests-macros/src/lib.rs:68-144`), then emit only the wiring needed in the current compilation context. + +`tests/sqlx` is a normal workspace crate named `eql_tests`, not just a directory of integration tests. It exposes the assertion builders, fixture loaders, scalar-domain model, matrix machinery, property oracles, and selectors consumed by the integration-test binaries (`tests/sqlx/src/lib.rs:16-54`). It aliases itself as `eql_tests` so macro expansions resolve the same way from the library crate and from integration-test crates (`tests/sqlx/src/lib.rs:5-12`). + +The main runtime descriptors are: + +| Type | Location | Role | +|---|---|---| +| `ScalarType` | `tests/sqlx/src/scalar_domains.rs:18-146` | Generic contract for one plaintext scalar: Postgres type token, fixture values, SQL domain derivation, extractor expressions, SQL literal rendering, oracle result sets, and proptest strategy. | +| `OrderedScalar` / `SignedScalar` / `MatchScalar` | `tests/sqlx/src/scalar_domains.rs:160-223` | Capability traits that gate ordered, sign-boundary, and bloom-match matrix arms at compile time. | +| `Variant` | `tests/sqlx/src/scalar_domains.rs:1083-1195` | Runtime domain variant (`Storage`, `Eq`, `Ord`, `OrdOre`, `Search`) whose suffixes, terms, required payload keys, supported operators, and extractors are resolved from `CATALOG`. | +| `ScalarDomainSpec` | `tests/sqlx/src/scalar_domains.rs:1202-1269` | Runtime `(ScalarType, Variant)` descriptor used by matrix tests: schema-qualified domain, column expression, placeholder payload, extractors, and catalog token. | +| `FixtureSpec` | `tests/sqlx/src/fixtures/spec.rs:28-233` | Typed fixture-generation contract: validated fixture name, index list, payload column type, plaintext values, and storage-only mode. | + +--- + +## 3. SQL surface (`src/v3/`) + +### 3.1 Domain type families + +All domains are **jsonb-backed** with a `CHECK` requiring an object carrying `v`/`i`/`c` (and `v = '2'`) plus the index-term key(s) gating the capability. ORE-bearing domains additionally require `ob` to be a **non-empty array** — `jsonb_typeof(VALUE -> 'ob') = 'array' AND jsonb_array_length(VALUE -> 'ob') > 0` — so the empty-ORE payload (`ob: []`, only ever produced by encrypting `''` into an ordered column) is **rejected at the domain boundary** with a `23514` check violation rather than mis-ordered downstream (issue #262/#316). The clause is data-driven, not hardcoded per type: it is emitted for any term opting in via `Term::nonempty_array_key` (`crates/eql-scalars/src/term.rs:82-87`, currently `Ore` only) → `DomainBlock.nonempty_array_keys` (`crates/eql-codegen/src/context.rs:66,93`), and so appears on every `_ord` / `_ord_ore` and on text `_search` (e.g. `int4_types.sql` `int4_ord`/`int4_ord_ore` CHECKs; reference golden `tests/codegen/reference/int4/int4_types.sql:53-54,71-72`). **No domain-over-domain exists** — every `CREATE DOMAIN … AS` resolves to a base type (`jsonb`/`text`/`smallint[]`), never another `eql_v3` domain (`int4_types.sql:14,29,45,63`). This is structurally required: operators resolve against base `jsonb`, so a derived domain would not inherit the blocker surface. + +| Family | Domains | CHECK requires | Capability | +|---|---|---|---| +| 8 ordered scalars (`int2/4/8`, `float4/8`, `numeric`, `date`, `timestamptz`) | `` / `_eq` / `_ord` / `_ord_ore` | none / `hm` / `ob` (non-empty array) / `ob` (non-empty array) | storage / eq / order+eq / order+eq | +| `bool` | `bool` only | none | storage-only — **all operators blocked** | +| `text` | `text` / `_eq` / `_ord` / `_ord_ore` / `_match` / `_search` | `hm` / `ob` (non-empty) / `bf` / all (`ob` non-empty) | eq / order / LIKE / all three | +| `jsonb` (separate design) | `json` / `ste_vec_entry` / `ste_vec_query` | SteVec document model | not the ordered-scalar materializer | + +```mermaid +graph TD + subgraph "eql_v3. ordered family (logical capability, NOT SQL inheritance)" + S["eql_v3.<T>
storage-only"] + EQ["eql_v3.<T>_eq
CHECK + 'hm'"] + ORD["eql_v3.<T>_ord
CHECK + non-empty 'ob'"] + ORE["eql_v3.<T>_ord_ore
CHECK + non-empty 'ob'"] + end + S -. "all ops blocked (RAISE)" .-> S + EQ -->|eq_term| HM["eql_v3.hmac_256
(= <>)"] + ORD -->|ord_term| OB["eql_v3.ore_block_256
(= <> < ≤ > ≥ + min/max)"] + ORE -->|ord_term| OB +``` + +> The four domains are independent `AS jsonb` siblings, **not** derived from each other. + +### 3.2 Functions + +| Function | LANGUAGE | Volatility | STRICT | Purpose | +|---|---|---|---|---| +| `eq_term(_eq) → hmac_256` | sql | IMMUTABLE | yes | extract `hm` (`int4_eq_functions.sql:13`) | +| `ord_term(_ord[_ore]) → ore_block_256` | sql | IMMUTABLE | yes | extract `ob` (`int4_ord_functions.sql:14`) | +| `match_term(text_*) → bloom_filter` | sql | IMMUTABLE | yes | extract `bf` | +| `eq`/`neq`/`lt`/`lte`/`gt`/`gte` wrappers | sql | IMMUTABLE | yes | `(a) (b)` | +| `min_sfunc`/`max_sfunc` | **plpgsql** | IMMUTABLE | yes | aggregate transition (`int4_ord_aggregates.sql:14,41`) | +| `min`/`max` AGGREGATE | — | — | — | `sfunc` + `combinefunc`, `parallel=safe` | + +Extractors/wrappers are inlinable: `LANGUAGE sql`, single `SELECT`, `IMMUTABLE`, **no `SET search_path`** (pinning would defeat inlining). Aggregate state funcs are correctly plpgsql (multi-statement) and may pin. Each wrapper has **3 overloads** — `(,)`, `(,jsonb)`, `(jsonb,)` — so bare-form `WHERE col = $1::jsonb` resolves. + +### 3.3 SEM index-term types (hand-written, `src/v3/sem/`) + +| Type | Kind | Cite | +|---|---|---| +| `eql_v3.hmac_256` | DOMAIN `AS text` | `sem/hmac_256/types.sql:12` | +| `eql_v3.ore_block_256_term` | composite `(bytes bytea)` | `sem/ore_block_256/types.sql:13` | +| `eql_v3.ore_block_256` | composite `(terms ore_block_256_term[])` | `sem/ore_block_256/types.sql:24` | +| `eql_v3.bloom_filter` | DOMAIN `AS smallint[]` | `sem/bloom_filter/types.sql:14` | +| `eql_v3.ore_cllw` | composite `(bytes bytea)` | `sem/ore_cllw/types.sql:21` | + +`ore_block_256(jsonb)` is plpgsql and `STRICT` (must RAISE when `ob` absent, `sem/ore_block_256/functions.sql:64-77`), but its inlinable helpers carry a `COMMENT … 'eql-inline-critical'` marker so the post-install pin pass leaves them unpinned. `compare_ore_block_256_term` is `IMMUTABLE` and **deliberately NOT STRICT** (NULL branches are load-bearing); block count N is derived from ciphertext length (`49*N+16`), not hardcoded, with a `<= 16` malformed-term guard (`functions.sql:163`). + +**Empty-array ordering is load-bearing (issue #262).** The new ORE-domain CHECK (§3.1) rejects `ob: []` on cast/insert, but the comparator path is hardened independently as defence-in-depth: `jsonb_array_to_ore_block_256` **`COALESCE`s an empty `array_agg` to a zero-term `ore_block_256_term[]`** rather than NULL (`functions.sql:35-56`), because a NULL term makes the comparator return NULL — silently dropping an empty-text row from `ORDER BY` and letting `eql_v3.max` wrongly return it. A non-NULL zero-term composite instead engages the array comparator's `cardinality = 0` guard (`compare_ore_block_256_terms`, `functions.sql:230-238`), which canonically sorts **empty before every non-empty term**. So the two layers agree: empty ORE is rejected at the domain boundary, and any empty term that still reaches the comparator (e.g. via the raw-jsonb SEM helpers, which bypass the domain CHECK) sorts first deterministically. The composite gets a **DEFAULT btree opclass** so `CREATE INDEX ON t (eql_v3.ord_term(col))` engages without an explicit opclass. The `=` operator declares `COMMUTATOR = =` + `HASHES, MERGES`. + +`bloom_filter` ships no operators file — text `_match` containment (`@>`/`<@`) rides on the **native `smallint[]` array operators** inherited through the domain (`sem/bloom_filter/types.sql`). `ore_cllw` (CLLW comparator + its own btree `operator_class.sql`) is the ordered comparison path for the `jsonb`/SteVec surface (`<`/`<=`/`>`/`>=` over `oc` terms), parallel to `ore_block_256` for scalars. Both `ore_block_256` and `ore_cllw` operator classes index through a functional index on the extractor (default btree opclass), not an operator class on a column, so they install on Supabase / managed Postgres without superuser privileges — which is why the dedicated Supabase subset build was dropped in 3.0.0 (see §5.1). (Stale `*operator_class.sql` comments still referencing a "Supabase variant" are leftovers from the multi-variant build and no longer reflect a real build path.) + +### 3.4 Blockers + +Three shared return-type-variant helpers in `src/v3/scalars/functions.sql` (`_bool`, `_jsonb`, `_text`) plus per-domain blockers covering every native jsonb operator reachable through domain fallback. **Both footguns are upheld everywhere:** + +- **`LANGUAGE plpgsql`, never `sql`** — a sql blocker is inlinable and the planner can elide the `RAISE` when the result is provably unused. plpgsql is opaque, so the body always runs. +- **Never `STRICT`** — a STRICT blocker returns NULL on NULL input, silently bypassing the exception. + +Storage-only domains (`` with no suffix, and all of `bool`) block **every** operator including `=`/`<` — fail-closed: no query without the matching index term. + +> **Mechanism note:** the generated per-domain blockers are **self-contained** — each inlines its own `RAISE EXCEPTION 'operator % is not supported for %'` body and does **not** call the three shared `encrypted_domain_unsupported_{bool,jsonb,text}` helpers (those are used by the hand-written `src/v3/jsonb/blockers.sql` surface). The shared helpers do carry a `SET search_path`, which is correct: a blocker is plpgsql and never inlined, so pinning is harmless. The consequence (plpgsql, non-STRICT, uniform message) is identical either way. + +### 3.5 `-- REQUIRE:` dependency graph + +```mermaid +graph TD + schema["src/v3/schema.sql (root)"] + crypto["crypto.sql"] --> schema + common["common.sql"] --> schema + hmF["sem/hmac_256/functions.sql"] --> schema + obF["sem/ore_block_256/functions.sql"] --> crypto & common + shFn["scalars/functions.sql (shared blockers)"] --> schema + tdef["<T>_types.sql"] --> schema + eqFn["<T>_eq_functions.sql"] --> tdef & shFn & hmF + ordFn["<T>_ord_functions.sql"] --> tdef & shFn & obF + ordAgg["<T>_ord_aggregates.sql"] --> ordFn +``` + +`schema.sql` is the universal root. `tsort` over these edges produces global build order. + +### 3.6 Runtime data-flow boundaries + +Database-side v3 code does **not** decrypt plaintext. It accepts encrypted JSONB payloads, validates domain shape, extracts deterministic SEM terms, and compares those terms. Plaintext encryption/decryption remains outside the database boundary; test fixtures use `cipherstash-client` to produce the payload JSON inserted into fixture tables (`tests/sqlx/src/fixtures/cipherstash.rs:156`, `tests/sqlx/src/fixtures/driver.rs:122`). + +```mermaid +flowchart LR + P["jsonb payload
v/i/c + terms"] --> C["eql_v3.* domain CHECK
object + v=2 + required keys
(ORE: 'ob' non-empty array)"] + C --> X["extractor
eq_term / ord_term / match_term"] + X --> SEM["SEM type
hmac_256 / ore_block_256 / bloom_filter"] + SEM --> OP["operator wrapper
or aggregate transition"] + OP --> Q["SQL predicate
ORDER BY / min / max"] + C --> B["unsupported operator blocker
plpgsql RAISE"] +``` + +Scalar term extraction splits by term: `hm` becomes `eql_v3.hmac_256` (`src/v3/sem/hmac_256/functions.sql:21`), `ob` becomes `eql_v3.ore_block_256` and flows into recursive ORE comparison (`src/v3/sem/ore_block_256/functions.sql:64`, `src/v3/sem/ore_block_256/functions.sql:115`), and `bf` becomes a `smallint[]` bloom-filter domain (`src/v3/sem/bloom_filter/functions.sql:51`). ORE block and CLLW comparators both expose default btree opclasses for functional indexes (`src/v3/sem/ore_block_256/operator_class.sql:20`, `src/v3/sem/ore_cllw/operator_class.sql:21`). + +The JSONB/SteVec surface is separate from scalar codegen. Root documents are `eql_v3.json`; path operators produce `eql_v3.ste_vec_entry`; containment uses a normalized `ste_vec_query`; entry equality/order compare deterministic `hm`/`oc` terms (`src/v3/jsonb/types.sql:107`, `src/v3/jsonb/types.sql:154`, `src/v3/jsonb/operators.sql:33`, `src/v3/jsonb/operators.sql:155`, `src/v3/jsonb/operators.sql:265`). + +> **Domain-flattening footgun on `->`/`->>` (issue #318).** The `eql_v3.json` domain flattens to its `jsonb` base type when an operator's RHS is an **unknown-typed literal**, so a bare `col -> 'sel'` binds the **native `jsonb -> text`** (a root-key lookup on the envelope) instead of the v3 selector-lookup operator — a *silent wrong answer* for direct-SQL callers (empirically: bare `-> 'sv'` returns jsonb `[]`; typed `-> 'sv'::text` binds the v3 operator). The Proxy is unaffected because it always sends typed `$n`. This is **intrinsic to the domain type-kind and cannot be closed by an added operator or blocker** — it can only be *pinned by test*: `v3_jsonb_bare_operand_flattens_to_native` (blocker face: bare `?`/`||` succeed as native, typed RHS raises) and `v3_jsonb_arrow_bare_operand_flattens_to_native` (supported-operator face: asserts via `pg_typeof` which operator binds *and* the user-visible value divergence), both in `tests/sqlx/tests/v3_jsonb_tests.rs:892,954`. See finding 7 in §6. + +```mermaid +flowchart TD + J["eql_v3.json document"] --> A["-> / ->> selector"] + A --> E["eql_v3.ste_vec_entry"] + E --> EQ["eq_term(entry)
hm or oc bytes"] + E --> ORD["ore_cllw(entry)
oc"] + J --> N["to_ste_vec_query(json)"] + N --> G["normalized jsonb @>
s / hm / oc fields"] +``` + +--- + +## 4. Test infrastructure (`tests/sqlx/`) + +### 4.1 Property-test suites + +Three suites verify SQL operator results agree with a plaintext oracle. **EQL is searchable encryption — all fixtures MUST be real ciphertext from cipherstash-client, never synthetic blobs.** + +| Suite | Location | What it tests | DB? | Gate | +|---|---|---|---|---| +| catalog | `crates/eql-scalars/src/proptest_invariants.rs` | Pure-Rust catalog invariants: ORE ⊇ HM operators, deduped operator union, extractor-resolves-iff-supported, bounded-int range ordering, payload keys == terms, eq-only has no ordering ops | No | none (runs on fork PRs) | +| fixture (oracle) | `tests/.../property/fixture_oracle.rs` | All-pairs eq/ord oracle + 3-overload function-double oracles + extractor identity over committed ciphertext | Yes (`#[sqlx::test]`) | none | +| fixture (cross-ciphertext) | `.../property/cross_ciphertext.rs` | Equal-plaintext/distinct-ciphertext (`_doubles` tables) compare equal via `hm` & `ob` | Yes | none | +| fixture (match smoke) | `.../property/match_smoke.rs` | Example-based bloom `@>`/`<@` containment over text `_match` fixtures | Yes | none | +| edge cases | `.../property/edge_cases.rs` | NULL propagation, blocker raises, CHECK rejection | Yes | none | +| e2e | `.../property/e2e_oracle.rs` | Same oracle over **fresh ZeroKMS encryption** each run | Yes + creds | `proptest-e2e` | + +> **Note:** the **catalog** suite is pure Rust over `CATALOG` with no DB introspection — it does **not** assert the SQL-level "blocker is non-STRICT / `LANGUAGE plpgsql`" properties. Those are exercised by the **SQLx matrix** blocker arms (`matrix.rs`), which sweep NULL argument positions and confirm the `RAISE` fires. + +**All-pairs oracle engine** (`tests/sqlx/src/property.rs:84` eq, `:120` ord): over every ordered pair `(a,b)`, asserts SQL `=`/`<>` (and `<`/`<=`/`>`/`>=` + `ord_term` sort order) match the plaintext comparison. Fixture and e2e suites differ **only in where rows come from**. + +### 4.2 Scalar matrix & snapshot pinning + +Single type-list source: `scalar_types!` macro (`tests/sqlx/src/scalar_types.rs:52-65`). `scalar_matrix!` (`matrix.rs:174-200`) emits **one `#[sqlx::test]` per (category, domain, operator, pivot)** by capability shape (`[eq,ord]` ordered / `[eq]` eq-only / `[eq,ord,search]` text / `[storage]` bool). + +SQLx assertions can't see a test that *stops running*. Committed token-normalized baselines under `tests/sqlx/snapshots/` close this gap, pinning each shape's ``-normalized test-name set — the scalar ordered/eq-only/text/storage-only shapes, plus the sibling jsonb SteVec-entry matrix. `tests/sqlx/snapshots/README.md` is the source of truth for which baselines exist and how each is regenerated. + +```mermaid +flowchart TD + LIST["cargo test --no-default-features --test encrypted_domain -- --list"] --> DISC["discover types from scalars::<X>:: prefixes"] + DISC --> MATCH{"per-type set == baseline for its shape?"} + MATCH -->|mismatch: dropped/renamed/cfg-gated| FAIL["snapshot diff non-empty → CI red"] + MATCH -->|ok| XCHECK{"discovered types == eql-codegen list-types?"} + XCHECK -->|catalog type missing matrix wiring| FAIL + XCHECK -->|ok| GITDIFF["git add -N + git diff --exit-code"] + GITDIFF -->|stale/new snapshot| FAIL + GITDIFF -->|clean| PASS["job passes"] +``` + +### 4.3 Fixture-generation pipeline + +```mermaid +flowchart LR + CAT["eql_scalars::CATALOG"] --> GEN["generate_all_fixtures.rs
(feature: fixture-gen)"] + PT["plaintext values"] --> GEN + CREDS["CS_* creds
ZeroKMS auth + client key"] --> ENC + GEN -->|spec().run() per token| ENC["cipherstash-client
encrypt → real hm/ob/bf"] + ENC --> FIX["gitignored fixtures/
eql_v3_*.sql, *_doubles.sql"] + FIX -->|include_str! at compile time| BIN["encrypted_domain test binary"] +``` + +`mise run test:sqlx:prep` → `fixture:generate:all` iterates `CATALOG` in one process. **Credential dependency:** encryption needs BOTH ZeroKMS auth (`CS_CLIENT_ACCESS_KEY` + `CS_WORKSPACE_CRN`) AND a client key (`CS_CLIENT_ID` + `CS_CLIENT_KEY`) — separate roles. There are **no committed fixture exceptions**: the jsonb SteVec document fixture `v3_ste_vec.sql` is now generated through the same `FixtureSpec` machinery (`tests/sqlx/src/fixtures/v3_ste_vec.rs`) and gitignored/regenerated like every scalar fixture. + +--- + +## 5. Build system + +`mise run build` → `tasks/build.sh`: **orphan sweep → codegen → REQUIRE scan (`src/v3` only) → self-containment check → tsort → concat → pin search_path → single `release/*.sql`**. As of 3.0.0 there is exactly **one** build path producing exactly **one** installer + uninstaller; the former Main / Supabase / Protect / v3-only variants are gone (commit `47263bde`, "collapse to a single self-contained v3 build, drop v2 variants"). + +```mermaid +flowchart TD + A["mise run build"] --> B["orphan sweep: delete generated scalars/*/*.sql
(-mindepth 2 spares functions.sql)
(build.sh:23-26)"] + B --> C["cargo run -p eql-codegen
regenerate ALL types from CATALOG
(build.sh:33)"] + C --> V["bake version.sql from version.template
(build.sh:92)"] + V --> E["find src/v3 -path '*.sql' → scan '-- REQUIRE:' → deps-v3.txt
(build.sh:100-113)"] + E --> SC["verify_v3_self_contained: every REQUIRE target under src/v3
(build.sh:59-74, called :115)"] + SC --> F["tsort | tac → deps-ordered-v3.txt (build.sh:117)"] + F --> G["verify_deps_exist (build.sh:38-51, called :118)"] + G --> H["xargs cat | grep -v REQUIRE → release/cipherstash-encrypt.sql (build.sh:120)"] + H --> I["append pin_search_path_v3.sql (build.sh:121)"] + I --> U["append uninstall-v3.sql → cipherstash-encrypt-uninstall.sql (build.sh:123)"] +``` + +The **orphan sweep runs first** (`build.sh:23-26`) so a catalog-removed type cannot leave stale generated SQL behind; codegen then regenerates from `CATALOG` (`build.sh:33`), still before any REQUIRE scan, so freshly generated files are what tsort orders. The REQUIRE scan globs `src/v3` only (`build.sh:100`); `verify_v3_self_contained` (`build.sh:59-74`) then fails the build if any v3 REQUIRE edge points outside `src/v3/`, and `verify_deps_exist` (`build.sh:38-51`) fails loudly if a tsorted dep references a missing file. `pin_search_path_v3.sql` is appended unconditionally (`build.sh:121`) — it is the v3-specific pin pass, so the old "NOT for v3" caveat no longer applies now that v3 is the only surface. Use `mise run clean && mise run build` — a bare build can leave a stale `release/*.sql`. + +### 5.1 Build output (single artifact) + +There are **no build variants** as of 3.0.0. The build globs `src/v3` only and emits one installer + one uninstaller under the canonical names: + +| Output | Built from | Notes | +|---|---|---| +| `release/cipherstash-encrypt.sql` | `src/v3/**/*.sql` (tsorted) + `tasks/pin_search_path_v3.sql` | The sole installer — the self-contained `eql_v3` surface (`build.sh:100,120-121`) | +| `release/cipherstash-encrypt-uninstall.sql` | `tasks/uninstall-v3.sql` | Matching uninstaller (`build.sh:123`) | + +The previous Main / Supabase / Protect / v3-only variants and their `-supabase` / `-protect` / `-v3` output names were removed. Because the v3 surface owns no `eql_v2` dependency and indexes through functional indexes over extractors (no superuser-only operator-class-on-column), it is already Supabase / managed-Postgres compatible without a dedicated subset build — which is what made the Supabase variant redundant. + +> **Stale artifacts caveat:** `release/` may still contain orphaned `cipherstash-encrypt-{supabase,protect,v3}.sql` from the old multi-variant build; the current `build.sh` no longer produces them. `mise run clean && mise run build` clears them. + +**v3 self-containment is enforced at build time:** `verify_v3_self_contained` (`build.sh:59-74`, called at `:115`) fails the build if any v3 `-- REQUIRE:` target is outside `src/v3/`. + +### 5.2 CI / test gates + +| Gate | Enforces | Where | +|---|---|---| +| `test:self_contained_v3` | No `eql_v2[._]` symbol in `src/v3` (incl. generated); v3 dep-closure stays under `src/v3/`; release artifact carries no `eql_v2` | `tasks/test/self_contained_v3.sh:9-50` | +| `test:clean_install_v3` | Installs v3 artifact into a DB with **no `eql_v2`** and smoke-tests | `tasks/test/clean_install_v3.sh` | +| `codegen:parity` | Codegen output == committed goldens `tests/codegen/reference//`; golden dir set == `list-types` | `tasks/codegen-parity.sh:28-40` | +| `test:matrix:inventory` | Matrix test names per type match snapshots; discovered types == `list-types` | `mise.toml:230-379` | +| `test:matrix:catalog-coverage` | Every `(type, domain)` has ≥1 matrix test | `mise.toml:499-582` | +| `test:splinter` | Supabase `function_search_path_mutable` lint; allowlists inline-critical extractors/wrappers | `tasks/test/splinter.sh` | +| `pin_search_path_v3.sql` (structural rule) | Pins `search_path` on all `eql_v3.*` functions EXCEPT inline-critical encrypted-domain functions, recognized **intrinsically** (any `LANGUAGE sql` IMMUTABLE fn taking a jsonb-backed `eql_v3` DOMAIN arg) — no per-type edit | `tasks/pin_search_path_v3.sql:74-87` | + +Two complementary skip mechanisms back up the structural rule for functions the domain-arg predicate can't see: an explicit SEM-function allowlist (`pin_search_path_v3.sql:37-59` — `ore_block_256_*`, `ore_cllw_*`, raw-jsonb `hmac_256`/`bloom_filter`/`ore_cllw` helpers that take composite/raw-`jsonb` args) and a `COMMENT … 'eql-inline-critical'` marker fallback (`pin_search_path_v3.sql:88-95`) for hand-written extension functions. The pin pass only touches `prokind IN ('f','w')`, so aggregates are handled separately (via the splinter allowlist), not pinned. + +### 5.3 Gitignored vs committed + +- **Gitignored (regenerated):** `src/v3/scalars/*/*_{types,functions,operators,aggregates}.sql`; `src/deps*.txt`; SQLx fixtures; all `release/*.sql`. +- **Committed (source of truth):** the Rust catalog (`crates/eql-scalars/src`); renderers (`crates/eql-codegen/src`); the shared blocker `src/v3/scalars/functions.sql`; hand-written `_extensions.sql`; SEM types (`src/v3/sem/`); codegen goldens (`tests/codegen/reference/`); matrix snapshots. + +> Because generated SQL is gitignored, the `self-contained-v3`, `matrix-coverage`, and `splinter` CI jobs each run `mise run build` (or stub fixtures) first to materialize them. + +--- + +## 6. Findings for design review + +1. **Self-containment holds** — zero `eql_v2.*` symbol references in executable v3 SQL; backed by build-time + CI structural checks, not convention. +2. **Both blocker footguns are enforced everywhere** — every blocker is `LANGUAGE plpgsql` and non-`STRICT`. Storage-only domains (and all of `bool`) block even `=`, so a value can't be queried without provisioning the matching index term (fail-closed). +3. **Surface is wider than the `int4` reference implies** — 8 ordered scalars + `bool` (eq-blocked) + `text` (bloom `bf` LIKE path) + a separate `jsonb`/SteVec design. SEM layer has 4 index types, not 2. Caveat on the `jsonb` line: it is the only scalar with no *generated* (catalog-driven) SQL surface, but a substantial *hand-written* `eql_v3` jsonb/SteVec surface (`src/v3/jsonb/` — `json`/`ste_vec_entry`/`ste_vec_query` domains, validators, `to_ste_vec_query` cast, GIN support, blockers) **is** installed and smoke-tested by `clean_install_v3`. "No surface yet" (per CLAUDE.md) refers specifically to the *materializer*, not to installed SQL. +4. **Capability model is data-minimal** — catalog rows declare only token/kind/suffix/terms/fixtures; all behavior lives in `Term`/`ScalarKind` `impl` methods with unit tests. A new term's behavior cannot be expressed as free-form catalog data. +5. **ORE comparator is generalized vs. its v2 fork origin** — `compare_ore_block_256_term*` are `IMMUTABLE` (the now-removed v2 originals defaulted VOLATILE) and block-count N is derived from ciphertext length rather than fixed at 8, so a single comparator serves int4 (N=8), timestamp (N=12), numeric (N=14). This is a historical fork-provenance note, not a live v2-vs-v3 comparison — `eql_v2` no longer ships. The `<= 16` malformed-term guard (`sem/ore_block_256/functions.sql:163`) is load-bearing — worth a targeted coverage check. +6. **Documented test blind spot** — the `scale`-feature arm is excluded from matrix inventory via `--no-default-features`, covered separately. The fixture suite's equality-true-across-distinct-ciphertexts branch relies entirely on the `_doubles` tables + e2e run; if the doubles generator regresses, that coverage is lost silently. +7. **`eql_v3.json` `->`/`->>` bare-literal silent wrong answer (issue #318)** — because the `eql_v3.json` domain flattens to base `jsonb` against an unknown-typed RHS literal, a direct-SQL caller writing bare `col -> 'sel'` binds the **native** envelope root-key lookup, not the v3 selector operator, and silently gets the wrong value. The CipherStash Proxy is immune (it sends typed `$n`), so this only bites callers querying the encrypted column by hand. It is intrinsic to the domain type-kind — **not closeable by a blocker or extra operator** — and is held only by regression tests (`v3_jsonb_tests.rs:892,954`, §3.6). Two consequences for design review: (a) any documentation/example for direct SteVec querying must cast the selector literal (`'sel'::text`); (b) the guarantee lives entirely in those two tests, so a resolution change in either direction must keep them green. + +--- + +## Appendix: key file index + +| Area | Path | +|---|---| +| Catalog (source of truth) | `crates/eql-scalars/src/lib.rs`, `kind.rs`, `term.rs`, `fixture.rs`, `spec.rs` | +| Renderers | `crates/eql-codegen/src/generate.rs`, `context.rs`, `operator_surface.rs`, `writer.rs` | +| Codegen goldens | `tests/codegen/reference//` | +| Schema root | `src/v3/schema.sql` | +| SEM index types | `src/v3/sem/{hmac_256,ore_block_256,bloom_filter,ore_cllw}/` | +| Shared blockers | `src/v3/scalars/functions.sql` | +| Generated per-type (gitignored) | `src/v3/scalars//_*.sql` | +| Property tests | `tests/sqlx/tests/encrypted_domain/property/` (+ `README.md`) | +| Matrix + snapshots | `tests/sqlx/src/matrix.rs`, `tests/sqlx/snapshots/` (+ `README.md`) | +| Build | `tasks/build.sh`, `mise.toml`, `tasks/pin_search_path_v3.sql` | +| Self-containment gate | `tasks/test/self_contained_v3.sh` | +| Adding a type | `docs/reference/adding-a-scalar-encrypted-domain-type.md` |