You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CI fmt: ran `cargo fmt --all` (edge_residue_probe / golden_helix_probe were
committed unformatted).
Correctness (codex / coderabbit):
* simd_int_ops::gemm_u8_i8 — VNNI dispatch was compile-time `#[cfg(target_feature)]`,
so the default x86-64-v3 GitHub build stripped both VNNI arms → scalar on
Ice Lake / SPR / Zen 4 silicon (codex P2 regression). Now RUNTIME
`is_x86_feature_detected!` (avx512vnni → avxvnni → scalar); compiles + reaches
VNNI under v3, and removes the pre-existing `needless_return` clippy warning.
* simd_avx2.rs U16x16 `shr`/`shl` — returned ZERO for any shift ∉{1,2,4,8};
now `_mm256_srl_epi16`/`_mm256_sll_epi16` with a runtime lane count (all shifts).
* amx_matmul::for_dpbusd — tile 1/2 shapes now match the operand contract
(tmm1 = VNNI kb/4×64, tmm2 = plain 16×kb); identical at kb=64 (tests
unaffected), correct for kb<64.
* backend::native gemv_f32/f64 — early-return on m==0 (don't slice `x[..n]`
when there are no rows; matches the scalar reference no-op).
* test_tile_zero_and_release — minimal config rewritten on the corrected
XTILECFG offsets (colsb=4 @16 / rows=1 @48), with an explanatory note.
Probes / docs:
* amx_probe matmul_f32 validator — true relative-L2 + max-abs (the old
`|e|.max(1.0)` denominator was an absolute test for |e|<1).
* amx_rb_probe rb_32 — assert K % 64 == 0 (was silently truncating the tail).
* doc `# Examples` (ignore) on the new public APIs: TileConfig::for_dpbusd_8,
tile_dpbusd_2x2, F32x8::mul_add, F32x8::cmp_gt_mask.
Validated under x86-64-v3 (GitHub target): clippy clean, `cargo build
--examples` Finished; native AMX probes still all CORRECT.
https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
0 commit comments