Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Each entry that ships in a published release links to the PR that introduced it.
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
- **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295))
- **`eql_v3.float4` / `eql_v3.float8` encrypted-domain type families (ordered).** Four jsonb-backed domains each for encrypted `real` / `double precision` columns — `eql_v3.float4` / `eql_v3.float8` (storage-only), `eql_v3.<T>_eq` (`=` / `<>` via HMAC), and `eql_v3.<T>_ord` / `eql_v3.<T>_ord_ore` (also `<` `<=` `>` `>=`, `MIN` / `MAX` via 8-block ORE) — generated from the `float4` / `float8` rows in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Both widths encrypt through a single f64 crypto path (`Plaintext::Float`): a `real` is widened to f64 before encryption (exact and monotonic), so `float4` vs `float8` is purely a Postgres-surface distinction and the ciphertext / ORE term are byte-identical. Ordering is correct for all non-NaN values via the standard monotonic IEEE-754 byte mapping (`f64::ENCODED_LEN == 8`, same as `int8`); `-0.0` canonicalizes to `+0.0` and `±Inf` order correctly. NaN is unordered and unspecified in the encoder — it can be encrypted and stored but is not given a meaningful comparison guarantee (any NaN rejection is client-side). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted IEEE-754 float column, closing the gap for `real` / `double` columns that had no v3 equivalent (the v3 `numeric` family is arbitrary-precision decimal, not binary float). ([#299](https://github.com/cipherstash/encrypt-query-language/pull/299))

### Changed

Expand Down
1 change: 1 addition & 0 deletions crates/eql-scalars/src/fixture.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ impl Fixture {
| Fixture::Jsonb(_)
| Fixture::Date(_)
| Fixture::Timestamptz(_)
| Fixture::Float(_)
| Fixture::Bool(_) => None,
}
}
Expand Down
12 changes: 12 additions & 0 deletions crates/eql-scalars/src/kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ impl ScalarKind {
| ScalarKind::Text
| ScalarKind::Jsonb
| ScalarKind::Bool
| ScalarKind::F32
| ScalarKind::F64
| ScalarKind::Date
| ScalarKind::Timestamptz => None,
}
Expand All @@ -97,6 +99,14 @@ impl ScalarKind {
matches!(self, ScalarKind::Text)
}

/// True for the IEEE-754 float kinds (`F32`, `F64`) — ordered, non-integer,
/// string-backed-fixture scalars whose `impl ScalarType` is hand-written in
/// `scalar_domains.rs` (like `text`/`numeric`). Keeps float classification in
/// the catalog crate alongside `is_int`/`is_temporal`/`is_text`.
pub const fn is_float(self) -> bool {
matches!(self, ScalarKind::F32 | ScalarKind::F64)
}

/// A debug/identifier string for the kind: the canonical Rust plaintext type
/// name (`"i32"`, `"chrono::NaiveDate"`, `"rust_decimal::Decimal"`). `Jsonb`
/// has **no generated SQL surface** and no catalog row, so calling this on it
Expand All @@ -113,6 +123,8 @@ impl ScalarKind {
ScalarKind::Timestamptz => "chrono::DateTime<Utc>",
ScalarKind::Numeric => "rust_decimal::Decimal",
ScalarKind::Bool => "bool",
ScalarKind::F32 => "f32",
ScalarKind::F64 => "f64",
ScalarKind::Jsonb => {
panic!("ScalarKind::rust_type: jsonb has no generated surface yet")
}
Expand Down
77 changes: 76 additions & 1 deletion crates/eql-scalars/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,16 @@ pub enum ScalarKind {
/// server-side. Like the other non-integer kinds, the bounded-numeric
/// accessors are unreachable for it by construction.
Bool,
/// 32-bit IEEE-754 binary float (`f32`, Postgres `real`/`float4`).
/// Ordered like the integer kinds via ORE, but with no i128 range
/// (`as_bounded_int()` returns `None`) and string-backed at the catalog
/// layer. Encrypts through the single f64 float crypto path
/// (`Plaintext::Float`) — the f32→f64 widening is exact and monotonic.
F32,
/// 64-bit IEEE-754 binary float (`f64`, Postgres `double precision`/
/// `float8`). The native width of the float crypto path (`F32` widens into
/// it); otherwise classified exactly like [`ScalarKind::F32`].
F64,
}

/// Always-present payload keys required by every generated domain CHECK,
Expand Down Expand Up @@ -178,6 +188,12 @@ pub enum Fixture {
/// storage-only, so this fixture is encrypted (ciphertext only, no index
/// term) and never participates in a comparison pivot. Distinct by value.
Bool(bool),
/// An IEEE-754 float plaintext rendered as a string (`"0.5"`, `"-inf"`).
/// The catalog stays zero-dep, so the string is parsed into `f32`/`f64` in
/// the SQLx harness, not here. Distinct by parsed value (the harness
/// `float_fixtures_are_distinct_by_value` guard enforces this). NaN and
/// `-0.0` are deliberately excluded; `±Inf` (`"inf"`/`"-inf"`) ARE fixtures.
Float(&'static str),
}

/// One generated public domain: a suffix appended to the type token and the
Expand Down Expand Up @@ -221,6 +237,7 @@ macro_rules! fixtures {
(date; $($s:literal),* $(,)?) => { &[$(Fixture::Date($s)),*] };
(timestamptz; $($s:literal),* $(,)?) => { &[$(Fixture::Timestamptz($s)),*] };
(bool; $($b:literal),* $(,)?) => { &[$(Fixture::Bool($b)),*] };
(float; $($s:literal),* $(,)?) => { &[$(Fixture::Float($s)),*] };
}

/// Domains shared by every ordered-integer scalar, in manifest file order:
Expand Down Expand Up @@ -488,9 +505,67 @@ pub const TEXT: ScalarSpec = ScalarSpec {
fixtures: TEXT_FIXTURES,
};

/// `float4` fixture plaintexts — IEEE-754 strings parsed into `f32` in the SQLx
/// harness (the catalog stays zero-dep). EVERY value is exactly representable in
/// f32 — each is a dyadic rational `n/2^k` (e.g. `2.25 = 9/4`, `0.25 = 1/4`,
/// `1024 = 2^10`), the value class `real` stores losslessly — so the `real`
/// round-trip is lossless and the f32→f64 widening before encryption is exact.
/// Keep new fixtures dyadic: a value like `0.1` is NOT f32-exact, and the
/// oracle's expected order (parsed `f32`) would then disagree with the value the
/// `real` column actually rounds to. The three pivots MUST be present
/// verbatim: `"-inf"` (min_pivot), `"0"` (origin/mid), `"inf"` (max_pivot).
/// NaN and `-0.0` are deliberately excluded (see the `float_special` suite).
/// Distinctness is enforced by `Fixture::Float` (above) and its guard test.
const FLOAT4_FIXTURES: &[Fixture] = fixtures!(float;
"-inf", "-1024", "-2.25", "-1", "-0.5", "-0.25",
"0", "0.25", "0.5", "1", "2.25", "1024", "inf");

/// `float8` fixture plaintexts — IEEE-754 strings parsed into `f64` in the SQLx
/// harness. The native width of the float crypto path; values span sign and
/// magnitude including subnormal-free interior points. The three pivots MUST be
/// present verbatim: `"-inf"` (min_pivot), `"0"` (origin/mid), `"inf"`
/// (max_pivot). NaN and `-0.0` are deliberately excluded.
const FLOAT8_FIXTURES: &[Fixture] = fixtures!(float;
"-inf", "-1e300", "-1000000", "-1.5", "-1", "-0.001",
"0", "0.001", "1", "1.5", "1000000", "1e300", "inf");

/// `float4` — an **ordered**, non-integer scalar (Postgres `real`). Reuses the
/// four-domain ordered shape (`ORDERED_INT_DOMAINS`); only kind and fixtures
/// differ. Both float widths encrypt through the SAME f64 crypto path
/// (`Plaintext::Float`), so `float4` vs `float8` is purely a Postgres-surface
/// distinction. Public (like `DATE`/`NUMERIC`) so the SQLx harness reads
/// `FLOAT4.fixtures` directly to parse the strings into `f32`.
pub const FLOAT4: ScalarSpec = ScalarSpec {
token: "float4",
kind: ScalarKind::F32,
domains: ORDERED_INT_DOMAINS,
fixtures: FLOAT4_FIXTURES,
};

/// `float8` — an **ordered**, non-integer scalar (Postgres `double precision`),
/// the native width of the float crypto path. Reuses the ordered shape. Public
/// so the SQLx harness reads `FLOAT8.fixtures` directly to parse into `f64`.
pub const FLOAT8: ScalarSpec = ScalarSpec {
token: "float8",
kind: ScalarKind::F64,
domains: ORDERED_INT_DOMAINS,
fixtures: FLOAT8_FIXTURES,
};

/// The scalar catalog — the single source of truth. Order is significant (it
/// drives generation order). New types are appended as their SQL surface lands.
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ, NUMERIC, TEXT, BOOL];
pub const CATALOG: &[ScalarSpec] = &[
INT4,
INT2,
INT8,
DATE,
TIMESTAMPTZ,
NUMERIC,
TEXT,
BOOL,
FLOAT4,
FLOAT8,
];

/// Materialise an integer scalar's fixtures into a typed `&'static` slice at
/// compile time. This is the **single-sourced** plaintext list the SQLx test
Expand Down
4 changes: 3 additions & 1 deletion crates/eql-scalars/src/proptest_invariants.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ fn any_term() -> impl Strategy<Value = Term> {
prop_oneof![Just(Term::Hm), Just(Term::Ore), Just(Term::Bloom)]
}

/// Strategy over the eight scalar kinds.
/// Strategy over the ten scalar kinds.
fn any_kind() -> impl Strategy<Value = ScalarKind> {
prop_oneof![
Just(ScalarKind::I16),
Expand All @@ -24,6 +24,8 @@ fn any_kind() -> impl Strategy<Value = ScalarKind> {
Just(ScalarKind::Jsonb),
Just(ScalarKind::Date),
Just(ScalarKind::Timestamptz),
Just(ScalarKind::F32),
Just(ScalarKind::F64),
]
}

Expand Down
116 changes: 113 additions & 3 deletions crates/eql-scalars/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ mod rust_tests {
assert_eq!(ScalarKind::Jsonb.as_bounded_int(), None);
assert_eq!(ScalarKind::Date.as_bounded_int(), None);
assert_eq!(ScalarKind::Timestamptz.as_bounded_int(), None);
assert_eq!(ScalarKind::F32.as_bounded_int(), None);
assert_eq!(ScalarKind::F64.as_bounded_int(), None);
}

#[test]
Expand Down Expand Up @@ -80,6 +82,8 @@ mod rust_tests {
assert!(!ScalarKind::Jsonb.is_int());
assert!(!ScalarKind::Date.is_int());
assert!(!ScalarKind::Timestamptz.is_int());
assert!(!ScalarKind::F32.is_int());
assert!(!ScalarKind::F64.is_int());
}

#[test]
Expand All @@ -93,6 +97,8 @@ mod rust_tests {
ScalarKind::Jsonb,
ScalarKind::Date,
ScalarKind::Timestamptz,
ScalarKind::F32,
ScalarKind::F64,
] {
assert!(!k.is_text());
}
Expand Down Expand Up @@ -171,6 +177,8 @@ mod rust_tests {
assert!(!ScalarKind::I16.is_temporal());
assert!(!ScalarKind::I32.is_temporal());
assert!(!ScalarKind::I64.is_temporal());
assert!(!ScalarKind::F32.is_temporal());
assert!(!ScalarKind::F64.is_temporal());
}

#[test]
Expand Down Expand Up @@ -523,7 +531,7 @@ mod catalog_tests {
}

#[test]
fn catalog_has_int4_int2_int8_date_timestamptz_numeric_text_bool_in_order() {
fn catalog_has_all_tokens_in_order() {
let tokens: Vec<&str> = CATALOG.iter().map(|s| s.token).collect();
assert_eq!(
tokens,
Expand All @@ -535,7 +543,9 @@ mod catalog_tests {
"timestamptz",
"numeric",
"text",
"bool"
"bool",
"float4",
"float8"
]
);
}
Expand Down Expand Up @@ -896,6 +906,102 @@ mod values_tests {
}
}

mod float_tests {
use crate::*;

fn scalar(token: &str) -> &'static ScalarSpec {
CATALOG
.iter()
.find(|s| s.token == token)
.unwrap_or_else(|| panic!("{token} missing from CATALOG"))
}

#[test]
fn float_specs_are_in_catalog_with_ordered_shape() {
for token in ["float4", "float8"] {
let s = scalar(token);
let suffixes: Vec<_> = s.domains.iter().map(|d| d.suffix).collect();
assert_eq!(suffixes, vec!["", "_eq", "_ord_ore", "_ord"]);
}
assert_eq!(scalar("float4").kind, ScalarKind::F32);
assert_eq!(scalar("float8").kind, ScalarKind::F64);
}

#[test]
fn float_kinds_are_not_bounded_int_temporal_or_text() {
for k in [ScalarKind::F32, ScalarKind::F64] {
assert_eq!(k.as_bounded_int(), None);
assert!(!k.is_int());
assert!(!k.is_temporal());
assert!(!k.is_text());
assert!(k.is_float());
}
}

#[test]
fn float_rust_types_are_f32_and_f64() {
assert_eq!(ScalarKind::F32.rust_type(), "f32");
assert_eq!(ScalarKind::F64.rust_type(), "f64");
}

/// NaN and -0.0 must never be fixtures: NaN is unordered/unspecified in the
/// encoder; -0.0 canonicalizes to +0.0 and would duplicate the +0.0 row.
/// ±Inf MUST be present (the boundary pivots).
#[test]
fn float_fixtures_exclude_nan_and_negative_zero_and_include_infinities() {
for token in ["float4", "float8"] {
let s = scalar(token);
let strings: Vec<&str> = s
.fixtures
.iter()
.map(|f| match f {
Fixture::Float(v) => *v,
other => panic!("{token} fixture must be Fixture::Float, got {other:?}"),
})
.collect();
for v in &strings {
let parsed: f64 = v
.parse()
.unwrap_or_else(|_| panic!("{token} fixture {v:?} must parse as f64"));
assert!(!parsed.is_nan(), "{token} fixture {v:?} is NaN");
assert!(
!(parsed == 0.0 && parsed.is_sign_negative()),
"{token} fixture {v:?} is -0.0"
);
}
assert!(strings.contains(&"inf"), "{token} must include +inf pivot");
assert!(strings.contains(&"-inf"), "{token} must include -inf pivot");
assert!(strings.contains(&"0"), "{token} must include 0 (origin)");
}
}

/// Distinct by parsed f64 value (the catalog dedupes only by literal string;
/// the fixture table keys on the value, so an aliasing pair would break
/// fetch_fixture_payload's fetch_one).
#[test]
fn float_fixtures_are_distinct_by_value() {
for token in ["float4", "float8"] {
let s = scalar(token);
let parsed: Vec<u64> = s
.fixtures
.iter()
.map(|f| match f {
Fixture::Float(v) => {
let x: f64 = v.parse().unwrap();
// total_cmp bit key; -0.0 already excluded so +0.0 is unique.
x.to_bits()
}
other => panic!("non-float fixture: {other:?}"),
})
.collect();
let mut sorted = parsed.clone();
sorted.sort_unstable();
sorted.dedup();
assert_eq!(sorted.len(), parsed.len(), "{token} has duplicate fixtures");
}
}
}

mod invariant_tests {
use crate::*;
use std::collections::HashMap;
Expand Down Expand Up @@ -936,7 +1042,11 @@ mod invariant_tests {
| Fixture::Text(s)
| Fixture::Jsonb(s)
| Fixture::Date(s)
| Fixture::Timestamptz(s) => DistinctKey::Str(s),
| Fixture::Timestamptz(s)
// Float fixtures dedupe by their literal here, like the other
// string-backed kinds (every float literal is distinct; the harness
// `float_fixtures_are_distinct_by_value` guard pins value-distinctness).
| Fixture::Float(s) => DistinctKey::Str(s),
// `bool` is storage-only and string-backed for distinctness: the two
// values dedupe by their literal, like the other non-numeric kinds.
Fixture::Bool(b) => DistinctKey::Str(if b { "true" } else { "false" }),
Expand Down
Loading
Loading