Skip to content

Eql v3#246

Closed
tobyhede wants to merge 1374 commits into
mainfrom
eql_v3
Closed

Eql v3#246
tobyhede wants to merge 1374 commits into
mainfrom
eql_v3

Conversation

@tobyhede

Copy link
Copy Markdown
Contributor

THIS IS THE EQL_V3 STACK.
VERY WIP.

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7507196-3891-4255-86f9-5f497df220c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eql_v3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

tobyhede and others added 29 commits June 3, 2026 18:59
…, fix blocker doc path

Hoist the src/v3 path strings into named constants in generate.rs (byte-identical
output — parity holds). Remove the now-dead Term::returns() (codegen builds the
extractor return type from CORE_SCHEMA + ctor) and its legacy eql_v2 test
assertions. Repoint the shared blocker's doc comment to src/v3/scalars/<T>/.
The eql_v3 SEM index-term functions are a hand-port of the eql_v2 originals.
The scalar matrix already exercises the array comparator's happy path end-to-end
against real ciphertext fixtures, but several branches are structurally
unreachable there and were tested only on the eql_v2 copies. Add a sibling
family test module covering them:

- differential v2<->v3 parity on real `ob` fixtures (both sides routed through
  extractor -> composite -> compare_..._terms, so the schema prefix is the only
  variable) — the strongest guard against a faithful-port slip;
- the 'Ciphertexts are different lengths' RAISE;
- NULL-term ordering branches the STRICT wrappers bypass;
- array NULL + empty/cardinality recursion base cases;
- has_* presence checks, the missing-`ob` RAISE, and the NULL-jsonb short-circuit.

Verified non-vacuous: a deliberately broken comparator fails T1/T2/T3 while the
independent T4/T5 stay green.
…entity names

Addresses CodeRabbit review: the artifact/test self-containment assertions only
matched schema-qualified eql_v2. references; extend them to also reject
eql_v2_<entity> names (eql_v2_encrypted, eql_v2_configuration) while still
allowing prose mentions of eql_v2 in doc comments. Verified the v3 artifact and
src/v3 contain zero eql_v2_ occurrences.
…e SCHEMA

With eql_v3 fully self-contained, the encrypted-domain families and the SEM
index-term types they call live in one schema, so there is no second schema to
point the core types at. Replace the CORE_SCHEMA/DOMAIN_SCHEMA pair with one
SCHEMA = "eql_v3" constant; the templates read it via the {{ schema }} global.
Output is byte-identical (golden parity holds).
…ardening

- Cargo.toml: update stale fixture command to fixture:generate:all
- eql-functions.md: index examples use distinct _eq/_ord columns (a column
  carries a single domain, so eq_term/ord_term apply to different columns)
- scalars/mod.rs: header describes the scalar_types!(matrix_suites) layout
- codegen-parity.sh: ls *.sql -> portable find (no set -e abort on empty dir)
- inlinability.rs: narrow arity-1 hmac_256 match to the jsonb overload
…ators

The six comparison operator backing functions (_eq, _neq, _lt, _lte,
_gt, _gte) were missing required @param and @return Doxygen tags,
failing docs:validate:required-tags with 6 errors and 6 warnings.
Tags follow the eql_v2 sibling src/ore_block_u64_8_256/operators.sql
convention.
cargo fmt --check was failing CI (test:lint and test:crates jobs) on
this file; apply rustfmt with no logic changes.
…t consts

Make the SQLx scalar-matrix harness type-agnostic ahead of the first
non-integer scalar, without adding one. Three integer assumptions are lifted:

- `ScalarType::FIXTURE_VALUES` (a `const`) becomes `fn fixture_values()`, so a
  scalar whose values can't be const-constructed can return a borrow of a
  lazily-built `Vec` instead. Integer impls still hand back their
  `eql_scalars::<T>_VALUES` const.
- New `min_pivot()` / `max_pivot()` trait methods replace the matrix's direct
  `<T>::MIN` / `<T>::MAX` pivot references, so a scalar without an inherent
  `::MIN`/`::MAX` const can supply an explicit sentinel.
- The ORDER BY arms build their `WHERE` clause from `to_sql_literal(zero)`
  instead of a hardcoded `> 0`, so a non-integer plaintext column typechecks.

Behaviour-preserving for the existing `int4` / `int2` types: the integer
`min_pivot`/`max_pivot` resolve to `Self::MIN`/`Self::MAX`, `to_sql_literal(0)`
renders `0`, and the generated test names are unchanged. The int4 cargo-expand
snapshot is regenerated to track the method-based bodies.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
- scalar_domains.rs: document fixture_values() stable-order contract
  (callers compare element-wise and index positionally without sorting)
- eql-tests-macros: assert emitted min_pivot/max_pivot bodies instead of
  loose MIN/MAX substrings that also match the doc comment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
feat(int8): add eql_v3.int8 encrypted-domain type family
…test hardening

Collates and resolves code-review feedback on the self-contained eql_v3 PR.

Inlinable SEM helpers (coderdan threads):
- Convert eql_v3.jsonb_array_to_bytea_array and
  eql_v3.jsonb_array_to_ore_block_u64_8_256 from LANGUAGE plpgsql to
  inlinable LANGUAGE sql IMMUTABLE (no SET), using a CASE-scalar-subquery
  form so JSON-null/empty-array inputs still return NULL rather than
  raising (a naive FROM-SRF + WHERE rewrite would regress null to error).
- These take a bare jsonb (not a domain), so pin_search_path.sql's
  structural skip does not cover them; opt them in via its documented
  'eql-inline-critical' COMMENT marker so they install unpinned and the
  planner can inline them. Add matching splinter allowlist rows. The
  eql_v2 copies stay plpgsql by design. Direct SEM tests added (sem.rs).

Stale references:
- eql-functions.md / sql-support.md: v3 extractors now document eql_v3
  SEM return types, not eql_v2.
- pin_search_path.sql header, scalar_types.rs and mutations.rs comments.

Test/build hardening:
- writer.rs validates the AUTO-GENERATED ownership header before writing.
- eql-scalars int_values! fails at compile time on narrowed-fixture overflow.
- parity.rs asserts the generated file set, not just per-file contents.
- mise.toml: shared test:sqlx:prep so test:sqlx:watch gets the same prep.
- build.sh v3-only build rejects REQUIRE edges outside src/v3.
- fixtures: single-source PAYLOAD_COLUMN const; enforce 63-byte identifier limit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Self-contained eql_v3 schema + standalone v3 installer
refactor(tests): generalise scalar matrix harness off integer-inherent consts
Add the `eql_v3.date` encrypted-domain scalar — the first non-integer ordered
type — on top of the integer-agnostic harness refactor (parent PR). The SQL
codegen needs no change (domains are jsonb-backed and token-driven); the work is
one catalog row plus the temporal wiring the refactor enabled.

Catalog (`eql-scalars`): `ScalarKind::Date` (`chrono::NaiveDate`),
`Fixture::Date(&str)` (zero-dep ISO strings), `DATE_FIXTURES` (16 dates incl.
the three matrix pivots), and `pub const DATE` appended to `CATALOG`, with
mirrored panic / pivot-presence / token-order tests.

Harness: an explicit `[temporal]` marker in the `scalar_types!` dispatch list
drives the divergences from the integer path — the `impl ScalarType` for a
temporal scalar is hand-written (chrono values can't be a `const` slice; pivots
are explicit sentinels), and `scalar_fixture!` stamps a pivot-presence assert
instead of the integer signed-extreme asserts. Adds `impl ScalarType for
NaiveDate` (LazyLock-parsed values, `min_pivot`/`max_pivot` sentinels,
quoted `to_sql_literal`), `EqlPlaintext for NaiveDate` (Cast::DATE), the sqlx
`chrono` feature + direct `chrono` dep, the CHANGELOG entry, and a temporal-kinds
note in the adding-a-scalar reference.
feat(scalars): add eql_v3.date encrypted-domain type + temporal wiring
… reference

CATALOG now includes the non-integer ordered scalar (date) alongside the
integers, so the 'only the integer scalars today' wording was stale.
Addresses CodeRabbit feedback on PR #256.
ScalarKind loses the five partial accessors that panicked on non-integer
kinds; bounds now live on the total BoundedIntKind, reached via
ScalarKind::as_bounded_int(). Date::min_symbol() is now a compile error.
A CATALOG invariant test replaces the deleted #[should_panic] tests.
Update the adding-a-scalar reference's `kind` bullet to reflect that the
bounded-numeric accessors moved to the total BoundedIntKind sub-enum, and
add the implementation plan under docs/superpowers/plans/.
refactor(eql-scalars): total BoundedIntKind sub-enum (replaces panicking ScalarKind accessors)
…a is_eq_only_token

The ordered_numeric_matrix! suite exercises </>/min/max; an equality-only
scalar (no _ord domain in eql-scalars::CATALOG) does not support those. Route
the matrix emitter through matrix_suite_for_entry, which reads eq-only-ness from
the catalog (is_eq_only_token) and emits a compile_error! for an eq-only token
instead of silently generating ordering tests it cannot pass. Makes the
catalog-derived is_eq_only accessor load-bearing and leaves a clean seam for an
equality-only matrix path. Both arms unit-tested.
tobyhede added 28 commits June 19, 2026 22:04
Derived IEEE PartialEq (NaN != NaN, +0.0 == -0.0) was inconsistent with the
manual Eq/Ord impls built on total_cmp (NaN == NaN, +0.0 != -0.0), breaking
Eq's reflexivity contract and Ord/PartialEq equality agreement.
…eal ciphertext)

Adds v3_jsonb_to_ste_vec_query_gin_is_cost_chosen: replicates one real v3_ste_vec
document to 5000 rows + 1 distinct pivot, builds the to_ste_vec_query GIN index,
and with enable_seqscan left ON asserts the planner CHOOSES the index for a
single-row-selective $.hello oc containment needle (matched == 1). Complements
the sibling *_gin_engages usability arm (which forces seqscan off).

Verified PG17: passes.
…t:sqlx:prep)

bench.sh hand-rolled build+cp+migrate but never generated the gitignored per-type
fixtures. #[sqlx::test(fixtures(...))] include_str!'s those .sql files at COMPILE
time, so once fixtures became generated/gitignored the bench binary stopped
compiling: 'couldn't read tests/sqlx/fixtures/eql_v2_numeric.sql'. bench-eql has
failed every nightly on main for 8+ days as a result — pre-existing, unrelated to
the scale tests, but it blocks them from ever running. Reuse test:sqlx:prep (build
+ cp + migrate + fixture:generate:all) so bench stays in lockstep with test:sqlx.
With bench.sh now running test:sqlx:prep, fixture:generate:all encrypts via
cipherstash-client and needs ZeroKMS auth (CS_CLIENT_ACCESS_KEY + CS_WORKSPACE_CRN)
plus a client key (CS_CLIENT_ID + CS_CLIENT_KEY); without them it fails with
'Auth strategy error: Not authenticated'. Add the four secrets to the bench job
env, mirroring test-eql.yml. (bench-eql triggers are push:main/schedule/dispatch,
all main-repo, so no fork-PR creds exposure.)
feat(v3): float4/float8 encrypted-domain types
Job-scoped secrets are exposed to every step, including third-party actions
(actions/checkout, jdx/mise-action, Swatinem/rust-cache) referenced by mutable
tags — a compromised tag could exfiltrate ZeroKMS/client creds before the bench
script runs. Move the four CS_* vars onto the 'Run bench tests' step that actually
needs them (fixture:generate:all). Least privilege; addresses PR review.
Adds the eql_v3 encrypted-domain property suite over the committed, curated
real-ciphertext fixtures:

- shared all-pairs operator oracle (property.rs): = / <> on _eq and the
  ordered comparisons + ord_term sort order, checked against a plaintext oracle
  over every ordered pair of fixture rows;
- function-double oracles: the generated eql_v3.eq/neq/lt/lte/gt/gte functions
  across all three overloads (domain-domain, domain-jsonb, jsonb-domain) plus
  term-extractor identity (eq_term==hm, ord_term==ob);
- bloom match smoke for the text _match domain; NULL/blocker/CHECK edge cases;
- the e2e suite (gated behind proptest-e2e) over fresh ZeroKMS encryption.

Fixtures are generated from the curated catalog values via FixtureSpec::run().
…P-3141)

Adds a dedicated test:sqlx:e2e mise task and CI job for the proptest-e2e suite
(needs ZeroKMS creds; the credential-free shards run the fixture suite), and
runs source doc validation unconditionally.
Documents the three property-test suites (catalog / fixture / e2e) over the
committed curated fixtures, the function-double oracles, and term-extractor
identity. CHANGELOG entry under [Unreleased].
test(v3): scaled, cost-chosen index-engagement tests for encrypted-domain surface
…le (CIP-3141)

The float4/float8 shared-index-term e2e test pulled both `hm` and `ob`
via `as_str()`, but the ORE `ob` term is a JSON array of block strings,
not a scalar string (only `hm` is a string). `as_str()` returned None on
the array, raising "payload missing string `ob`". Split the helper:
`hm` stays a string, `ob` extracts the array and compares directly,
matching the canonical extractor in property.rs. Latent until the gated
e2e suite started running in CI.
…or (CIP-3141)

`float4_and_float8_share_index_terms_for_the_same_value` asserted byte-equality
of the raw `ob` ORE arrays of two independently-encrypted payloads. That can
never hold: a BlockORE term is `Left (deterministic) ++ Right (16-byte random
per-ciphertext nonce + nonce-masked truth tables)`, so two encodings of the SAME
value — same width, same cast — are byte-UNEQUAL by construction. Ordering is
decided by the ORE compare function, not raw bytes. (The cast is irrelevant:
`real`/`double` collapse to one f64 `ColumnType::Float` in cipherstash-client;
the deterministic Left halves are byte-identical, which is what proves the two
widths share an encoding.)

The bug stayed latent because the e2e suite is feature/creds-gated and, when it
did run, an earlier `ob`-as-string extraction errored out before the assertion;
fixing that extraction (fe52d42) unmasked the wrong assertion.

Compare the extracted `ord_term`s through the SQL `eql_v3.ore_block_256` `=`
operator (the only correct ORE check) and keep the deterministic `hm` equality
term as a direct byte comparison. Also correct the CHANGELOG claim of a
"byte-identical ORE term" to "equal under the ORE comparator".

Verified: the test now passes against fresh ZeroKMS encryption.
…t_expand

[codex] assert scalar scale predicates are selective
test(sqlx): eql_v3 SQL-function property tests (CIP-3141)
docs-static (source-only SQL doxygen coverage + required-tags) ran on
every PR, unlike the other heavy jobs. Its inputs are a strict subset of
the changes-job relevant filter — src/**, the crates/** codegen build it
depends on, and the tasks/docs/** validator scripts — so a PR that leaves
relevant false cannot change its outcome. Gate it on the same flag as the
other jobs: consistent, and drops a redundant codegen build on
markdown-only PRs without losing coverage. A narrower src/**-only filter
was rejected — it would risk a silent false-green (ci-required counts
skipped as pass) by skipping on a real input change.

Sync .github/workflows/README.md (coverage map, known gaps, recently-
closed, operator-setup verification) and the stale validate-job comment.
Engineered fixture pair (NEEDLE's ngrams ⊆ HAY's ngram set, yet NEEDLE is
not a contiguous substring of HAY) plus a creds-free catalog contiguity
guard that pins the plaintext invariant.

Strengthen the divergence test with a raw-bf subset assertion: assert
needle-bf ⊆ haystack-bf directly on the stored `bf` arrays via native
jsonb containment, independent of the `eql_v3` domain `@>` operator and the
`match_term` extractor. Previously the subset was only inferred from the
domain `@>` it was meant to validate (circular); the direct check localizes
a future tokenizer change to a precise 'bf arrays no longer a subset'
failure instead of an opaque @>-false.
The cross-ciphertext oracle include_str!s eql_v2_<token>_doubles.sql at
compile time, but stub-fixtures.sh only stubbed eql_v2_<token>.sql and the
literal .gitignore *.sql entries — neither matches the _doubles glob. So the
no-creds matrix-coverage / inventory jobs could not compile the encrypted_domain
binary to --list it (rustc: couldn't read eql_v2_int4_doubles.sql, ...).

Stub eql_v2_<token>_doubles.sql alongside eql_v2_<token>.sql for every catalog
token, consistent with the helper's stub-the-complete-set policy (harmless
extras for storage-only tokens with no real doubles fixture).
test(v3): text-search direct-function coverage + bloom-vs-LIKE divergence lock-in
test(v3): cross-ciphertext equality via per-type doubles fixtures (CIP-3141)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants