refactor(hpc): bf16_tile_gemm fallback delegates to the polyfill (dedup of #222)#223
Conversation
PR #222 added ndarray::simd::bf16_tile_gemm_16x16 by copying the F32x16 kernel out of hpc::bf16_tile_gemm::fallback_path, leaving the same kernel in two places. Collapse it: the polyfill fn is the single source of truth; the hpc AMX wrapper's fallback now calls crate::simd::bf16_tile_gemm_16x16, with the AMX TDPBF16PS tile path still layered on top. Drops the now-unused F32x16 / bf16_to_f32_batch import. Both suites pass (hpc fallback + simd_ops parity); clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GJ4NVBSjq1w5h7RmTbVafb
|
Warning Review limit reached
More reviews will be available in 55 minutes and 3 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Follow-up to #222.
#222 added
ndarray::simd::bf16_tile_gemm_16x16by copying theF32x16kernel out ofhpc::bf16_tile_gemm::fallback_path— leaving the same BF16 GEMM in two places.This collapses it to one source of truth:
simd::bf16_tile_gemm_16x16(the polyfill primitive) is the single kernel.hpc::bf16_tile_gemm's fallback now just callscrate::simd::bf16_tile_gemm_16x16; its AMXTDPBF16PStile path stays layered on top for AMX hosts.F32x16/bf16_to_f32_batchimport.Net:
+1 caller, −30 duplicated lines. Both test suites pass (hpc::bf16_tile_gemm::tests::fallback_matches_scalar_reference_k64+simd_ops::bf16_tile_gemm_tests::{matches_scalar_reference_k64, accumulates_into_c}); clippy-D warnings+ fmt clean on rustc 1.95.🤖 Generated with Claude Code
https://claude.ai/code/session_01GJ4NVBSjq1w5h7RmTbVafb
Generated by Claude Code