Skip to content

Commit 38122a0

Browse files
committed
feat(simd): re-export AMX/VNNI int8 GEMM (matmul_i8_to_i32) through simd.rs
Surface hpc::amx_matmul::{matmul_i8_to_i32, amx_available} via the canonical ndarray::simd::* consumer entry (W1a "all SIMD from ndarray::simd"), std-gated. This lets a consumer reach the full int8 dispatch ladder -- AMX TDPBUSD tile (byte-asm, 16384 MAC/instr, Sapphire Rapids+) -> AVX-512 VPDPBUSD -> AVX-VNNI -> scalar, bit-identical across tiers -- without dipping into hpc::amx_matmul directly. Additive re-export only; no behaviour change. Consumed by turbovec's ndarray::simd-routed polyfill scan (lance-graph-turbovec), which scores TurboQuant as a batched int8 GEMM so the SIMD/AMX backend selection lives in ndarray, not the consumer. https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
1 parent cb77a31 commit 38122a0

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

src/simd.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -570,6 +570,16 @@ pub use crate::hpc::cam_pq::{kmeans, squared_l2};
570570

571571
pub use crate::hpc::heel_f64x8::cosine_f32_to_f64_simd;
572572

573+
// Dispatched integer matmul — the polyfill entry for batched int8 scoring.
574+
// `matmul_i8_to_i32` runtime-selects AMX `TDPBUSD` tiles (byte-asm, 16384
575+
// MAC/instr, Sapphire Rapids+) → AVX-512 VPDPBUSD → AVX-VNNI → scalar, and
576+
// is bit-identical across tiers. Surfaced here so a consumer reaches the
577+
// whole AMX ladder through the canonical `ndarray::simd::*` import (W1a)
578+
// without dipping into `crate::hpc::amx_matmul` directly. `amx_available()`
579+
// exposes the runtime tier check for reporting.
580+
#[cfg(feature = "std")]
581+
pub use crate::hpc::amx_matmul::{amx_available, matmul_i8_to_i32};
582+
573583
// Elementwise slice ops — polyfill-dispatched (F32x16/F64x8 chunks + scalar tail).
574584
#[cfg(feature = "std")]
575585
pub use crate::simd_ops::{

0 commit comments

Comments
 (0)