feat: eql-types — canonical EQL v3 payload types (Rust)#236
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Prototype of a single-source-of-truth crate for EQL payload types: one Rust definition per shape, generating TypeScript (ts-rs) and JSON Schema (schemars). Two tiers — frozen eql_v2_encrypted v2.3 contract, and the capability-encoded eql_v2_int4 variant family. Draft for review; not wired into the build or CI.
Two wire-fidelity bugs in the FROZEN v2.3 types — both diverged from the canonical eql-payload-v2.3.schema.json they claim to mirror: - `bf` is stored by EQL as `smallint[]` (signed i16). A `match` filter sized above 32768 (configurable up to 65536) emits upper-half bit positions as negative signed values, which `Option<Vec<u16>>` cannot deserialize. Changed to `Option<Vec<i16>>`. - `EqlEncrypted` required `k` via `#[serde(tag = "k")]`, but the wire contract and `eql_v2.check_encrypted` make `k` optional on the scalar form (required: only v, c, i) and discriminate on c-vs-sv. Kept the tag for serialization and codegen (TS discriminated union + JSON Schema oneOf) and hand-wrote a tolerant Deserialize that mirrors check_encrypted: key off `k` when present, else fall back to c/sv. Also corrects the `bf` range in the canonical reference schema (minimum 0 -> signed smallint -32768..32767) to match `smallint[]`. Regenerated bindings/ and schema/ from the updated types; added conformance tests for k-less scalars, negative bf, and the sv path (the previously-untested flatten + untagged SteVecTerm route).
…talog Replace the eql_v2_int4 prototype tier with a v3 tier covering every eql_v3 encrypted domain: one capability-encoded struct per SQL domain (23 across int4/int2/int8/date/timestamptz/text), stamped by the eql_v3_domain! macro from reusable term newtypes (Ciphertext, Hmac256, OreBlockU64_8_256, BloomFilter). Index terms are required fields — Option does not appear in the tier — mirroring the generated domain CHECKs (envelope v/i/c + term keys, envelope version still v: 2). tests/catalog_parity.rs dev-depends on eql-scalars and asserts the v3 registry exactly covers CATALOG (every domain, catalog order) and that each type's schemars required keys equal envelope + catalog term keys, so the crate cannot drift from the generated SQL surface. ts-rs bindings land in bindings/v3/, JSON Schemas (with injected $id) in schema/v3/, both checked in; mise types:generate / types:check plus a CI step in the rust-crates job keep them fresh. eql-types joins the workspace and the lean test:crates set. The Int4Tagged self-describing proposal is removed from code (the x tag is not on the v3 wire) and preserved as README prose.
80d9ca5 to
af715c0
Compare
The macro hid exactly what a protocol crate exists to show — the struct definitions. Its only real guarantee (envelope uniformity) is already covered by the catalog parity tests, rustfmt could not format the invocations, and 'pub struct Int4Eq' was un-greppable. Each domain is now a plain hand-written struct with the same fields, derives, doc comments, and SQL_DOMAIN const the macro emitted; bindings/ and schema/ regenerate byte-identical (verified: empty git diff after types:generate).
…ed changes Scope the base crate to the Rust contract only: - Remove the frozen eql_v2_encrypted v2.3 tier (v2_3.rs, its tests, bindings, and schema) — EQL 2.3 won't consume these types; the crate targets the eql_v3 surface. - Remove the ts-rs and schemars dependencies, derives, generated bindings/ and schema/ output, and the types:generate / types:check mise tasks + CI step. TypeScript bindings and JSON Schemas return as two stacked changes so each lands as a small, reviewable diff. - Rework the required-keys parity test to be behavioural instead of schemars-based: a payload carrying exactly the catalog's keys (per eql-scalars ENVELOPE_KEYS + Term::term_json_keys) must round-trip identically, and removing any one key must fail deserialization — the same drift guarantee, proven through serde alone. The crate now depends on serde + serde_json only.
… registry domains, share ENVELOPE_KEYS Four review findings closed at the serde layer: - v is now SchemaVersion, a validated newtype whose deserializer rejects any value other than 2 (including the string "2" that the CHECK's ->> coercion would admit) — the Rust analogue of VALUE->>'v' = '2', failing at the type boundary instead of at INSERT. - Every domain struct (and Identifier) is #[serde(deny_unknown_fields)]: a payload carrying keys outside the domain's set fails to deserialize rather than being silently stripped on the next serialize, so a pass-through consumer cannot lose terms it didn't know about. - The registry derives each domain name from the type's own SQL_DOMAIN (new V3Domain trait) instead of re-typing 23 string literals — two same-shaped types (_ord vs _ord_ore) can no longer be registered under each other's domain. - ENVELOPE_KEYS is hoisted into eql-scalars (the catalog) and consumed by eql-codegen's CHECK generation, eql-types' parity tests, and the sqlx harness's payload_required_keys — collapsing three hand-synced copies into one definition. codegen golden parity stays byte-identical. The catalog parity sweep gains unknown-key and wrong-version rejection legs across all 23 domains.
|
@tobyhede heads-up — one commit in this PR (aa46959) reaches outside
What changed on your side:
Verified: the codegen golden-parity gate passes byte-for-byte (generated SQL is unchanged), Shout if you'd rather this hoist land separately from the types crate — it's a self-contained commit and easy to peel off. |
| /// Envelope version — always `2` (`EQL_SCHEMA_VERSION`); any other | ||
| /// value fails deserialization. | ||
| pub v: SchemaVersion, |
|
@tobyhede — a second, more substantive question out of the type-modelling discussion happening around this PR. This one is about the v3 surface itself, not the crate. 1.
2. Are combination domains planned, or out?
IMHO there should be 3 text types:
|
The fn-pointer registry struct becomes a trait with one blanket impl on PhantomData<T>, so Vec<Box<dyn DomainType>> entries are zero-sized type-level handles and V3Domain::SQL_DOMAIN stays the single per-type anchor the impl reads from. The separate registry module is gone; the inventory (all()) lives in v3/mod.rs.
…gate to inventory order The behavioural required-keys test was the only roundtrip consumer; that gate is schema-based in the stacked schemars change (schemars required reflects the serde contract), with per-type strictness spot checks in v3_conformance. serde_json moves to dev-dependencies — the lib no longer touches Value.
… const trait sql_domain() is implemented directly on each type's PhantomData handle in its token file — the domain string still has exactly one definition per type, and the catalog parity test still catches a typo'd or mis-ordered domain. The const-trait + blanket-impl indirection (and type_name, which only test messages used) is gone.
…lves Consumers holding a payload value can now ask it for its SQL domain directly (payload.sql_domain()). The Box<dyn DomainType> inventory keeps its zero-sized PhantomData handles via one blanket impl that delegates through a transient T::default() — allocation-free, since every field defaults to an empty string/vec. Default exists on the payload types only to power that; a default payload is structurally complete but semantically empty.
…Sized-bounded static Review finding: Int4Eq::default() was a constructible, serializable payload (v:2, empty i/c/hm) that passes the generated domain CHECK - and empty hm terms all match each other under encrypted equality. The PhantomData blanket impl now delegates through sql_domain_static() (excluded from the vtable by `where Self: Sized`, so the trait stays object-safe) and no payload instance is ever constructed. Default comes off the 23 payload types, the term newtypes, and Identifier; SchemaVersion keeps its CURRENT-valued Default.
db28654 to
035952e
Compare
| /// `ScalarSpec::domain_name`. | ||
| fn domain(&self) -> &'static str { | ||
| self.sql_domain() | ||
| .strip_prefix("eql_v3.") |
auxesis
left a comment
There was a problem hiding this comment.
Test-coverage review — PR #236 (eql-types crate)
The new eql-types crate adds 23 capability-encoded payload structs across 6 tokens. Behavioural serde coverage in v3_conformance.rs is solid for the reference token (int4: roundtrip for every domain, missing-term rejection, wrong-version rejection, unknown-key rejection) plus text_match. The drift gate in catalog_parity.rs pins the domain-name inventory and order against eql-scalars::CATALOG.
The gap: every other token (int2, int8, date, timestamptz, and text's eq/ord/storage domains) has zero behavioural serde tests. catalog_parity.rs only checks sql_domain_static() strings — it never deserializes a payload, so the wire field names (v/i/c/hm/ob/bf) on those 18 structs are exercised by nothing. These are hand-written, copy-pasted structs; a transposed or misspelled field name in int8.rs (hmm for hm) would pass catalog_parity and the build, and only surface as a wire-incompatibility in a downstream consumer. The two inline comments below target the highest-value slices of this.
Note: the module doc in catalog_parity.rs claims the "exhaustive catalog-driven sweep (every domain, every required key)" lives there — but that test does not check keys; per-key strictness is deferred to the stacked schemars change. The comment oversells current coverage.
Additional coverage gaps not posted inline
- Missing-envelope-key negatives untested.
rejects_*tests cover term keys (hm/ob/bf) andv, but no test asserts a payload missingcoriis rejected. Same serde mechanism as the term-key tests, butv/i/care the shared envelope contract and the one thing every domain CHECK asserts — a missing-c/ missing-icase mirroringint4_eq_rejects_missing_hmacwould close it. - Instance
sql_domain(&self)on real values never called. Every test uses thesql_domain_static()associated fn;catalog_paritycalls.domain()only on zero-sizedPhantomDatahandles. The instance method on a deserialized payload value is a one-line wrapper, so this is low priority. - Changed files
consts.rs/scalar_domains.rsneed no new tests —ENVELOPE_KEYSis re-sourced fromeql-scalarswith byte-identical values (["v","i","c"]); existing codegen/matrix tests cover the behaviour.
(Out of scope per review rules: no crypto/security issues noted.)
| /// `eql_v3.timestamptz_eq` — HMAC equality (`=`, `<>`). | ||
| #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)] | ||
| #[serde(deny_unknown_fields)] | ||
| pub struct TimestamptzEq { |
There was a problem hiding this comment.
Gap: The timestamptz family is the one structurally-distinct token (equality-only — no _ord/_ord_ore, per the deliberate 8-block-ORE limitation in the module doc) yet has no behavioural serde test at all. A roundtrip and a missing-hm rejection would protect both the wire shape and the intentional absence of an ordered domain (the int4 template was copy-pasted to produce this; an accidental extra ob field or a dropped hm would not be caught by catalog_parity, which only checks domain names).
use eql_types::v3::timestamptz::{Timestamptz, TimestamptzEq};
#[test]
fn timestamptz_eq_round_trips() {
let wire = json!({
"v": 2,
"i": { "t": "events", "c": "occurred_at" },
"c": "mp_base85_ciphertext",
"hm": "deadbeef"
});
let parsed: TimestamptzEq = serde_json::from_value(wire.clone()).unwrap();
assert_eq!(serde_json::to_value(&parsed).unwrap(), wire);
assert_eq!(TimestamptzEq::sql_domain_static(), "eql_v3.timestamptz_eq");
}
#[test]
fn timestamptz_eq_rejects_missing_hmac() {
let no_hm = json!({
"v": 2,
"i": { "t": "events", "c": "occurred_at" },
"c": "mp_base85_ciphertext"
});
let r: Result<TimestamptzEq, _> = serde_json::from_value(no_hm);
assert!(r.is_err());
}Expected: both pass on current code; timestamptz_eq_round_trips fails if a field is renamed/added, and _rejects_missing_hmac fails if hm is ever made optional.
There was a problem hiding this comment.
Added timestamptz_round_trips_and_enforces_equality_term in tests/v3_conformance.rs — roundtrips both Timestamptz (storage) and TimestamptzEq byte-for-byte, pins the domain names, and asserts a missing-hm payload is rejected (so the equality term cannot silently become optional and a stray ob would fail the strict roundtrip). Thanks — good catch on the one structurally-distinct token. (b3c0cef)
| result.is_err(), | ||
| "TextMatch must reject a payload with no bf" | ||
| ); | ||
| } |
There was a problem hiding this comment.
Gap (lopsided coverage): Only int4 (and text_match) is roundtripped; int2, int8, date, text_eq, text_ord/text_ord_ore, and the storage-only Text carry the same wire field names but are serialized by no test. catalog_parity.rs checks domain names only, so a copy-paste field typo (e.g. hm -> hmm in int8.rs) ships green. A single token-parametrised roundtrip sweep closes the whole class cheaply:
#[test]
fn ordered_tokens_round_trip_each_domain() {
let eq = |t: &str| json!({"v":2,"i":{"t":t,"c":"x"},"c":"ct","hm":"deadbeef"});
let ord = |t: &str| json!({"v":2,"i":{"t":t,"c":"x"},"c":"ct","ob":["b0","b1"]});
use eql_types::v3::{int2::*, int8::*, date::*, text::*};
macro_rules! rt { ($ty:ty, $w:expr) => {{
let w = $w; let p: $ty = serde_json::from_value(w.clone()).unwrap();
assert_eq!(serde_json::to_value(&p).unwrap(), w);
}}; }
rt!(Int2Eq, eq("a")); rt!(Int2Ord, ord("a")); rt!(Int2OrdOre, ord("a"));
rt!(Int8Eq, eq("a")); rt!(Int8Ord, ord("a")); rt!(Int8OrdOre, ord("a"));
rt!(DateEq, eq("a")); rt!(DateOrd, ord("a")); rt!(DateOrdOre, ord("a"));
rt!(TextEq, eq("a")); rt!(TextOrd, ord("a")); rt!(TextOrdOre, ord("a"));
}Expected: passes now; fails the instant any non-int4 token's wire field name drifts from the shared envelope/term contract.
There was a problem hiding this comment.
Added non_int4_tokens_round_trip_every_domain — roundtrips storage/_eq/_ord/_ord_ore for int2/int8/date/text (and pins each catalog domain name), so a copy-paste field typo like hm -> hmm now fails the build instead of shipping green. Went slightly beyond the snippet to also cover the storage-only structs you flagged. Also added rejects_missing_envelope_keys from your non-inline note (missing v/i/c rejection). (b3c0cef)
catalog_parity.rs checks domain names only, so the 18 non-int4 payload structs — hand-written copies of the int4 template — had their wire field names exercised by nothing; a typo like `hm` -> `hmm` in int8.rs would ship green and only surface in a downstream consumer. Add serde-conformance tests in v3_conformance.rs: - non_int4_tokens_round_trip_every_domain: roundtrips storage/_eq/_ord/ _ord_ore for int2/int8/date/text and pins each catalog domain name. - timestamptz_round_trips_and_enforces_equality_term: roundtrips the equality-only token (storage + _eq) and keeps its hm term required. - rejects_missing_envelope_keys: a payload missing any of the v/i/c envelope keys fails to deserialize, mirroring the missing-term negatives. Addresses review comments from @auxesis on #236.
The eql-types crate landed on eql_v3 (PR #236) after this branch forked, so merging the base in surfaced a catalog_parity failure: CATALOG now has the numeric family and ordered timestamptz, but v3::all() didn't. - numeric.rs: four ordered domains (storage/_eq/_ord/_ord_ore), mirroring date.rs; numeric is the first scalar with a >8-block ORE term (14) - timestamptz.rs: add the two ordered domains; the eq-only/8-block-limit rationale is gone now that eql_v3.ore_block_256 derives N from term length - mod.rs: register the six new domains in all(), in CATALOG order - terms.rs: the ob term's SQL constructor is eql_v3.ore_block_256 (renamed this branch) and is width-agnostic (8/12/14 blocks) - v3_conformance.rs: cover numeric + timestamptz ord wire shapes; drop the stale equality-only claim - README: timestamptz no longer eq-only; add numeric
The eql-types crate landed on eql_v3 (PR #236) after this branch forked, so merging the base in surfaced a catalog_parity failure: CATALOG now has the numeric family and ordered timestamptz, but v3::all() didn't. - numeric.rs: four ordered domains (storage/_eq/_ord/_ord_ore), mirroring date.rs; numeric is the first scalar with a >8-block ORE term (14) - timestamptz.rs: add the two ordered domains; the eq-only/8-block-limit rationale is gone now that eql_v3.ore_block_256 derives N from term length - mod.rs: register the six new domains in all(), in CATALOG order - terms.rs: the ob term's SQL constructor is eql_v3.ore_block_256 (renamed this branch) and is width-agnostic (8/12/14 blocks) - v3_conformance.rs: cover numeric + timestamptz ord wire shapes; drop the stale equality-only claim - README: timestamptz no longer eq-only; add numeric
feat: eql-types — TypeScript bindings (stacked on #236)
catalog_parity.rs checks domain names only, so the 18 non-int4 payload structs — hand-written copies of the int4 template — had their wire field names exercised by nothing; a typo like `hm` -> `hmm` in int8.rs would ship green and only surface in a downstream consumer. Add serde-conformance tests in v3_conformance.rs: - non_int4_tokens_round_trip_every_domain: roundtrips storage/_eq/_ord/ _ord_ore for int2/int8/date/text and pins each catalog domain name. - timestamptz_round_trips_and_enforces_equality_term: roundtrips the equality-only token (storage + _eq) and keeps its hm term required. - rejects_missing_envelope_keys: a payload missing any of the v/i/c envelope keys fails to deserialize, mirroring the missing-term negatives. Addresses review comments from @auxesis on #236.
feat: eql-types — canonical EQL v3 payload types (Rust)
feat: eql-types — TypeScript bindings (stacked on #236)
The eql-types crate landed on eql_v3 (PR #236) after this branch forked, so merging the base in surfaced a catalog_parity failure: CATALOG now has the numeric family and ordered timestamptz, but v3::all() didn't. - numeric.rs: four ordered domains (storage/_eq/_ord/_ord_ore), mirroring date.rs; numeric is the first scalar with a >8-block ORE term (14) - timestamptz.rs: add the two ordered domains; the eq-only/8-block-limit rationale is gone now that eql_v3.ore_block_256 derives N from term length - mod.rs: register the six new domains in all(), in CATALOG order - terms.rs: the ob term's SQL constructor is eql_v3.ore_block_256 (renamed this branch) and is width-agnostic (8/12/14 blocks) - v3_conformance.rs: cover numeric + timestamptz ord wire shapes; drop the stale equality-only claim - README: timestamptz no longer eq-only; add numeric
catalog_parity.rs checks domain names only, so the 18 non-int4 payload structs — hand-written copies of the int4 template — had their wire field names exercised by nothing; a typo like `hm` -> `hmm` in int8.rs would ship green and only surface in a downstream consumer. Add serde-conformance tests in v3_conformance.rs: - non_int4_tokens_round_trip_every_domain: roundtrips storage/_eq/_ord/ _ord_ore for int2/int8/date/text and pins each catalog domain name. - timestamptz_round_trips_and_enforces_equality_term: roundtrips the equality-only token (storage + _eq) and keeps its hm term required. - rejects_missing_envelope_keys: a payload missing any of the v/i/c envelope keys fails to deserialize, mirroring the missing-term negatives. Addresses review comments from @auxesis on #236.
The eql-types crate landed on eql_v3 (PR #236) after this branch forked, so merging the base in surfaced a catalog_parity failure: CATALOG now has the numeric family and ordered timestamptz, but v3::all() didn't. - numeric.rs: four ordered domains (storage/_eq/_ord/_ord_ore), mirroring date.rs; numeric is the first scalar with a >8-block ORE term (14) - timestamptz.rs: add the two ordered domains; the eq-only/8-block-limit rationale is gone now that eql_v3.ore_block_256 derives N from term length - mod.rs: register the six new domains in all(), in CATALOG order - terms.rs: the ob term's SQL constructor is eql_v3.ore_block_256 (renamed this branch) and is width-agnostic (8/12/14 blocks) - v3_conformance.rs: cover numeric + timestamptz ord wire shapes; drop the stale equality-only claim - README: timestamptz no longer eq-only; add numeric
What this is
crates/eql-types— the canonical Rust types for EQL v3 payloads: one struct pereql_v3SQL domain, intended as the single source of truth for every tool that produces or consumes EQL payloads (cipherstash-client,protect-ffi, CipherStash Proxy).This PR is the Rust contract only (serde + serde_json, nothing else). The generated artifacts stack on top:
bindings/v3/schema/v3/, stacked on feat: eql-types — TypeScript bindings (stacked on #236) #268Review in that order; each is a small, mechanical diff.
The problem
Type information is lost at every hop of the chain:
protect-ffi hand-writes its TypeScript; it drifts from the Rust it describes; stack widens it further. By protect-dynamodb, a value's static type tells you almost nothing, so code hand-rolls runtime guards — e.g.
term.k === 'ct' && term.hm, which checks a shape EQL never defined. A generated, single-source crate removes the hand-copying.What's in the crate
Capability-encoded types. One type per SQL domain, mirroring
eql-scalars::CATALOG— 23 across six scalars:int4,int2,int8,date<T>,<T>Eq,<T>OrdOre,<T>Ordtimestamptz(eq-only)Timestamptz,TimestamptzEqtextText,TextEq,TextMatch,TextOrdOre,TextOrdEach carries the envelope (
v,i,c— exactly what the generated domain CHECKs require) plus its index terms as required fields.Optiondoes not appear: hold anInt4Eqandhmis present, guaranteed by the Rust type. The runtime guessing that produced the protect-dynamodb bug becomes impossible to need. Each type also carries aSQL_DOMAINconst (e.g."eql_v3.int4_eq").Reusable term newtypes (
src/v3/terms.rs):Ciphertext(c),Hmac256(hm),OreBlockU64_8_256(ob),BloomFilter(bf— signedi16, matchingsmallint[]). They serialize transparently but keep their names in the generated TS/schema outputs.Drift protection (
tests/catalog_parity.rs, dev-dep oneql-scalars):ENVELOPE_KEYS+Term::term_json_keys) must round-trip identically, and removing any one key must fail deserialization — so a term field can't silently becomeOptionor grow a wrong wire key.Versioning note. "v3" names the SQL schema generation (
eql_v3.*). The JSON envelope staysv: 2and the wire keys are unchanged from v2 (hm/ob/bf) — the purpose-named rename indocs/plans/eql-payload-scheme-discipline-rfc.mdis deferred. The domain CHECKs assertv = '2'.The crate joins the workspace and the lean
mise run test:cratesset (fmt, clippy, test — no database).Design decisions (and reversals from the original prototype)
eql_v2_encryptedv2.3 wire contract alongside the new types. Dropped: EQL 2.3 won't consume these types; the crate targets theeql_v3surface.eql_v3_domain!macro was tried and unrolled: in a protocol crate the struct definitions are the documentation, and the macro's only real guarantee (envelope uniformity) is already covered by the parity tests.int4_eqandint8_eqpayload are wire-identical); per-token key-sniffing unions recreate the exact untagged failure mode this crate exists to retire. Consumers read from typed columns and know the domain._ord_oreis its own struct, not an alias of_ord— distinct SQL domain, and the parity test wants 1:1 CATALOG coverage.Int4Tagged(self-describingxtag) removed — not part of the v3 wire contract. The idea is preserved in the README as a future direction if a payload revision adds a discriminator.Follow-ups (not in this stack)
eql_v3.<domain>column and assert the CHECK accepts it (and rejects the missing-term form).numeric) follow the established shape: one token file + registry entries; the parity test enforces the rest.