|
| 1 | +# PP-15 Interface-Signal Audit — Verdict |
| 2 | + |
| 3 | +Reviewer: Opus PP-15 Interface-Signal Auditor |
| 4 | +Branch: `claude/pr-x4-splat-cascade-design` @ `5e266d19` |
| 5 | +Mindset: signal-in-the-interface, no materialization between cascade steps |
| 6 | + |
| 7 | +Scope: ~125 public surfaces across `src/hpc/linalg/`, `src/hpc/pillar/`, |
| 8 | +`src/hpc/ogit_bridge/`. Test 1 (Click P-1), Test 2 (signal-in-interface), |
| 9 | +Test 3 (cascade composition). |
| 10 | + |
| 11 | +The codebase has **two house dialects**: |
| 12 | + |
| 13 | +1. **carrier-method dialect** — `Spd3::sqrt(&self) -> Spd3`, |
| 14 | + `CovHighD::sandwich(&self, m: &Self) -> Self`, `MotionBand::sigma(self) -> f32`, |
| 15 | + `CognitiveBridge::nearest_basin(&self, …)`. Honors Click P-1. |
| 16 | +2. **flat-slice imperative dialect** — `fn op_f32(x: &mut [f32], gamma: &[f32], …)`. |
| 17 | + The vast majority of `hpc/linalg/*` adopts this; it is **structurally |
| 18 | + anti-Click-P-1** because `&[f32]` is not a carrier — every caller has to |
| 19 | + pre-materialize an `out` buffer and there is no return type to chain. |
| 20 | + |
| 21 | +The data-flow rule in `.claude/rules/data-flow.md` ("No `&mut self` during |
| 22 | +computation. Ever.") rules out the literal `x.layer_norm(...)` mutating-self |
| 23 | +Click-P-1 rewrite. The correct Click-P-1 form is therefore **compute-returns-new**: |
| 24 | +`fn layer_norm(&self, gamma, beta, eps) -> Self` on a typed carrier |
| 25 | +(`ArrayView1<f32>` extension trait or a thin `Tensor1<f32>` newtype). |
| 26 | + |
| 27 | +## Click P-1 violations (free-function-with-carrier-arg) |
| 28 | + |
| 29 | +| File:Line | Current signature | Click-P-1 rewrite | |
| 30 | +|---|---|---| |
| 31 | +| `src/hpc/linalg/attention.rs:100` | `attention_f32(q, k, v, &mut out, &cfg, b, s)` | `q.attend(k, v, &cfg) -> AttentionOut` where `AttentionOut: Deref<[f32]>` | |
| 32 | +| `src/hpc/linalg/attention.rs:221` | `flash_attention_f32(q, k, v, &mut out, &cfg, b, s, block)` | `q.flash_attend(k, v, &cfg, block) -> AttentionOut` | |
| 33 | +| `src/hpc/linalg/batched.rs:60` | `batched_gemm_f32(x, y, &mut out, b, m, k, n, α, β)` | `BatchedMat{x,m,k,batch}.matmul(&BatchedMat{y,k,n,batch}, α, β) -> BatchedMat` — shape is part of the type, not loose args | |
| 34 | +| `src/hpc/linalg/batched.rs:132` | `batched_gemm_4d_f32(...)` | `BatchedMat4D.matmul(&Self, α, β) -> BatchedMat4D` | |
| 35 | +| `src/hpc/linalg/conv.rs:60` | `conv1d_f32(input, kernel, stride, pad, &mut out)` | `Signal1D.conv(&Kernel1D, stride, pad) -> Signal1D` | |
| 36 | +| `src/hpc/linalg/conv.rs:123,202,285,392` | `conv2d{,_3x3,_5x5,_im2col}_f32(input, in_shape, kernel, kshape, stride, pad, &mut out)` | `Image.conv2d(&Kernel, stride, pad) -> Image` — shapes ride inside `Image`/`Kernel` | |
| 37 | +| `src/hpc/linalg/norm.rs:54` | `layer_norm_f32(&mut x, γ, β, ε)` | `x.layer_norm(γ, β, ε) -> Tensor1` (NOT `&mut self`) | |
| 38 | +| `src/hpc/linalg/norm.rs:100` | `rms_norm_f32(&mut x, γ, ε)` | `x.rms_norm(γ, ε) -> Tensor1` | |
| 39 | +| `src/hpc/linalg/norm.rs:143` | `group_norm_f32(&mut x, γ, β, groups, ε)` | `x.group_norm(γ, β, groups, ε) -> Tensor1` | |
| 40 | +| `src/hpc/linalg/activations_ext.rs:67,93,117,143,168` | `{gelu,gelu_tanh,silu,swish,mish}_f32(&mut x[, β])` | `x.gelu()`, `x.silu()`, … each returning `Tensor1` | |
| 41 | +| `src/hpc/linalg/loss.rs:244` | `softmax_xent_backward_f32(logits, targets, &mut grad_out, b, v)` | `logits.softmax_xent_backward(targets) -> GradTensor` | |
| 42 | +| `src/hpc/linalg/loss.rs:155` | `cross_entropy_with_logits_batched_f32(logits, targets, b, v) -> f32` | `logits.xent_batched(targets) -> f32` — return-shape is fine, but carrier-method form keeps it composable | |
| 43 | +| `src/hpc/linalg/wasserstein.rs:35` | `sinkhorn_knopp_f32(cost, m, n, a, b, ε, iter, tol) -> Vec<f32>` | `CostMatrix.sinkhorn(&a, &b, ε, iter, tol) -> TransportPlan` | |
| 44 | +| `src/hpc/linalg/wasserstein.rs:287` | `wasserstein_1_f32(cost, plan, m, n) -> f32` | `plan.cost_against(&CostMatrix) -> f32` | |
| 45 | +| `src/hpc/linalg/wasserstein.rs:112` | `hungarian_f32(cost, m) -> Vec<u32>` | `CostMatrix.hungarian() -> Assignment` | |
| 46 | +| `src/hpc/linalg/rope.rs:125` | `RopeCache::apply_qk_f32(&self, &mut q, &mut k, positions, b, s, h)` | `q.with_rope(&cache, positions) -> Tensor` — current shape is half-Click-P-1 (carrier IS `self`) but it mutates two *other* slices, so the receiver is wrong | |
| 47 | +| `src/hpc/pillar/temporal_sandwich.rs:165` | `sandwich_update_3x3(σ, m) -> [[f32;3];3]` | `Spd3.sandwich(&self, m: &Spd3) -> Spd3` — the typed version *already exists* in `pillar/cov_high_d.rs:124`; this 3×3 hardcoded copy is the violation | |
| 48 | +| `src/hpc/pillar/temporal_sandwich.rs:201` | `is_spd_3x3(m) -> bool` | `Spd3.is_spd(&self) -> bool` | |
| 49 | +| `src/hpc/pillar/koestenberger.rs:78,116` | `path1_direct_sandwich(σ, m) -> Spd3`, `path2_spectral(σ, m) -> Spd3` | `σ.sandwich_direct(&m)`, `σ.sandwich_spectral(&m)` | |
| 50 | +| `src/hpc/pillar/koestenberger.rs:194` | `max_abs_error_spd3(a, b) -> f64` | `Spd3.max_abs_error(&self, other) -> f64` | |
| 51 | +| `src/hpc/pillar/signature.rs:91,188,211` | `signature_d2_deg3(path, n) -> [f32;…]`, `sigker_hl(p, q) -> f32`, `brownian_path_d2(rng, n) -> Vec<f32>` | `Path2D.signature_deg3() -> Signature`, `sig_p.kernel(&sig_q) -> f32`, `rng.brownian_d2(n) -> Path2D` | |
| 52 | + |
| 53 | +## Materialization-forced (out: &mut Buffer args that should be typed return) |
| 54 | + |
| 55 | +Every `_f32` function in `linalg/{attention, batched, conv, norm, |
| 56 | +activations_ext, loss}` takes `out: &mut [f32]` (or mutates `x` in place). |
| 57 | +That is **15 out of ~22 sprint-introduced surfaces** forcing the caller to |
| 58 | +pre-size and materialize a buffer for the next step. |
| 59 | + |
| 60 | +Concretely, the cascade `attention → layer_norm → gelu → batched_gemm` |
| 61 | +requires 4 pre-allocated buffers and 4 manual length-arithmetic asserts at |
| 62 | +the call site, instead of `q.attend(k,v,&cfg).layer_norm(γ,β,ε).gelu().matmul(&w)`. |
| 63 | + |
| 64 | +**Severity heuristic**: this is the dominant failure mode of the sprint |
| 65 | +(15/22 surfaces), and the reason consumers will be forced to write the |
| 66 | +materialization-glue PP-15 is supposed to prevent. |
| 67 | + |
| 68 | +## Cascade composition gaps (where A.method() doesn't fit B's receiver) |
| 69 | + |
| 70 | +1. **Splat tick** — `TileBinning::from_projected(&ProjectedBatch, &Camera) -> Self` |
| 71 | + is good (`tile.rs:105`), and `binning.tile_instances(tx,ty) -> &[TileInstance]` |
| 72 | + is good. But `rasterize_tile` and `rasterize_frame` (`raster.rs:71,213`) |
| 73 | + are free functions taking the binning + a `&mut framebuffer` — they |
| 74 | + should be `binning.rasterize(&projected, &camera, bg) -> Framebuffer`. |
| 75 | + Net: 3 chain steps that *don't* compose: `binning → rasterize_*(out)`. |
| 76 | +2. **NARS revision** — `nars_revision(a, b)`, `nars_deduction`, `nars_abduction` |
| 77 | + (`nars.rs:322,342,362`) all take **two `NarsTruth` by value, return |
| 78 | + `NarsTruth`** — *no materialization*, *no buffers*, and the return type IS |
| 79 | + the next call's receiver. Test 2 and Test 3 pass. Test 1 fails on the |
| 80 | + surface (free fn, not method), but this is the strongest form of |
| 81 | + "signal-in-the-interface" in the audit. Trivial method rewrite: |
| 82 | + `a.revise(b) -> NarsTruth`. |
| 83 | +3. **Cognitive cell encode** — `bridge.nearest_basin(cell_value, hint) -> u16` |
| 84 | + (`cognitive_bridge.rs:335`). The bare `u16` is a stringly-typed key. The |
| 85 | + downstream `codec::rdo_cell(basin, …)` (TTL-referenced; not yet |
| 86 | + implemented in this branch) will accept a `u16` or `usize` and the |
| 87 | + compiler will not catch a basin/family/codebook-index mixup. Return type |
| 88 | + should be `BasinHandle(u16)` so the next click in the cascade is |
| 89 | + `bridge.family_of(handle)` (the method already takes `u16` at line 311 — |
| 90 | + change to `BasinHandle` and the entire ogit_bridge surface becomes |
| 91 | + type-safe). |
| 92 | +4. **`PillarReport`** — `prove_pillar_7() -> PillarReport` honors return-type. |
| 93 | + But `PillarReport` has only one method (`print(&self)`, line 172 of |
| 94 | + `prove_runner.rs`); there is no `report.assert_passed() -> &Self` or |
| 95 | + `.merge(other) -> PillarReport`. So cascading the eleven pillar probes |
| 96 | + into a single `PillarSuite` requires manual `Vec<PillarReport>` glue |
| 97 | + (and indeed `prove_pillar_8()` at `temporal_sandwich.rs:323` returns |
| 98 | + `Vec<PillarReport>`, breaking the chain shape vs. its peers). |
| 99 | + |
| 100 | +## Click P-1 honors (sprint-introduced surfaces that do it right) |
| 101 | + |
| 102 | +- `Spd3::sqrt`, `Spd3::sandwich`, `Spd3::eig`, etc. in `linalg/matrix.rs` — methods on the SPD carrier. |
| 103 | +- `CovHighD::sandwich(&self, m: &Self) -> Self` at `pillar/cov_high_d.rs:124` — **textbook Click P-1**: typed carrier, typed args, typed return, composes (`a.sandwich(&b).sandwich(&c).frobenius_sq()`). |
| 104 | +- `CovHighD::log_spd(&self) -> Self` at `cov_high_d.rs:202` — ditto. |
| 105 | +- `MotionBand::sigma(self) -> f32` at `temporal_sandwich.rs:111` — enum-as-carrier. |
| 106 | +- `TileBinning::from_projected(&ProjectedBatch, &Camera) -> Self` and `.tile_instances(...) -> &[TileInstance]` — typed-surface returns, no out-buffer. |
| 107 | +- `CognitiveBridge::load_embedded() -> Result<Self, OgitError>`, `.codebook() -> &CamCodebook`, `.family_of(idx) -> &FamilyBitmap` — clean carrier-methods. Only `nearest_basin`'s bare `u16` return type lets the chain down. |
| 108 | +- `nars_revision/deduction/abduction(a, b) -> NarsTruth` — value-in, value-out, zero materialization; the *cleanest* signal-in-interface in the audit even though the surface is a free fn. |
| 109 | + |
| 110 | +## Net call |
| 111 | + |
| 112 | +Of ~22 sprint-introduced public surfaces in `linalg/{attention, batched, |
| 113 | +conv, norm, activations_ext, loss, wasserstein, rope}`, **roughly 18 fail |
| 114 | +Test 1 (free-fn-with-carrier-arg) and 15 fail Test 2 (materialization-forced |
| 115 | +`out: &mut`)**. Pillar code is split: `pillar/cov_high_d.rs` and |
| 116 | +`pillar/ewa_sandwich_3d.rs` honor Click P-1; `pillar/{temporal_sandwich, |
| 117 | +koestenberger, signature}` are free-function-on-bare-arrays. |
| 118 | +`ogit_bridge` is mostly clean — single fix needed: lift `u16` to `BasinHandle`. |
| 119 | + |
| 120 | +**Severity**: high in count, low-to-medium in difficulty. None of the |
| 121 | +violations are algorithmically wrong; they are all skin-deep signature |
| 122 | +rewrites where the kernel body is reusable verbatim. A same-day cleanup |
| 123 | +sprint could mechanically wrap each `op_f32(&[f32], …, &mut [f32])` in an |
| 124 | +extension-trait method on `ArrayView1<f32>` (or a thin `Tensor1` newtype) |
| 125 | +returning `Tensor1`. The data-flow rule "No `&mut self` during computation" |
| 126 | +forces the new-allocation Click-P-1 form anyway — and that allocation is |
| 127 | +already happening at every consumer site, just expressed as |
| 128 | +`let mut out = vec![0.0; n]` instead of inside the method. |
| 129 | + |
| 130 | +The **structural** decision needed before the cleanup: is the carrier |
| 131 | +`ArrayView1<f32>` (ndarray-native), a new `Tensor1<f32>` newtype, or a |
| 132 | +shape-aware `Tensor<D>`? Without that decision the cleanup will recreate |
| 133 | +the inconsistency in a different shape. Recommend: pick `Tensor1` / |
| 134 | +`Tensor2` / `Tensor4` thin newtypes around `Vec<f32>` + shape tuple, with |
| 135 | +`Deref<Target=[f32]>` for SIMD escape-hatch. This matches the |
| 136 | +`BatchedMat`/`Image`/`Signal1D` carriers suggested in the violation table. |
| 137 | + |
| 138 | +**Recommended sequence**: |
| 139 | +1. Land `BasinHandle(u16)` — 1-hour change, type-safety dividend across `ogit_bridge`. |
| 140 | +2. Decide carrier shape (`Tensor1`/`Tensor2`/`Tensor4`). |
| 141 | +3. Add extension-trait methods that wrap each `*_f32` free fn — keep the free fns as `#[doc(hidden)]` shims for one release. |
| 142 | +4. Migrate `pillar/temporal_sandwich.rs` and `pillar/koestenberger.rs` to call methods on the existing `Spd3` carrier (the typed version already exists in `matrix.rs` — these modules are reimplementing what they could be using). |
| 143 | + |
| 144 | +This is a **follow-on cleanup sprint** (one to two days for a single |
| 145 | +worker), not a same-day patch — the carrier-type decision is load-bearing. |
| 146 | + |
| 147 | +## Sentinel: pp15-interface-signal-completed |
0 commit comments