Skip to content

Add cargo-udeps dead code detection#20

Open
Jackson57279 wants to merge 2 commits into
masterfrom
ci/dead-code-detection
Open

Add cargo-udeps dead code detection#20
Jackson57279 wants to merge 2 commits into
masterfrom
ci/dead-code-detection

Conversation

@Jackson57279

@Jackson57279 Jackson57279 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds cargo-udeps to detect unused Cargo dependencies across the Rust workspace
  • Introduces scripts/check-udeps.sh and make udeps for local runs (requires nightly)
  • Adds a dedicated dead-code CI job on Ubuntu
  • Includes make udeps in make ci and documents the check in CONTRIBUTING.md

Motivation

Agent Readiness flagged missing dead-code detection tooling. This wires in the standard Rust approach (cargo-udeps) so unused dependencies are caught in CI and locally before merge.

Testing

  • make udeps — passes on master (All deps seem to have been used.)
  • scripts/check-udeps.sh — executable, installs cargo-udeps when missing

Follow-ups

  • Optional: add cargo-udeps to pre-commit (pre-push stage) once .pre-commit-config.yaml lands on master

Made with Cursor


Summary by cubic

Adds workspace-wide cargo-udeps checks (CI + make udeps via scripts/check-udeps.sh on nightly) and refactors hot dot/GEMM paths with AVX‑512/AVX2 kernels to speed up attention and matmuls.

  • Refactors
    • AVX‑512/AVX2 multi-accumulator dot kernels; used in attention and Q4_K/Q6_K GEMM with runtime dispatch.
    • Vectorized axpy; gemm_f32_cpu now uses SIMD dot; set codegen-units = 1 for better inlining/LTO.

Written for commit 8ba9ad3. Summary will update on new commits.

Review in cubic

Wire unused-dependency checks into make, CI, and contributor docs so
unused Cargo deps are caught before merge.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/check-udeps.sh">

<violation number="1" location="scripts/check-udeps.sh:19">
P1: Add `--all-features` to the udeps invocation; otherwise feature-gated dependencies are skipped and can be misreported as unused.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread scripts/check-udeps.sh
cargo install cargo-udeps --locked
fi

cargo +nightly udeps --workspace --all-targets "$@"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Add --all-features to the udeps invocation; otherwise feature-gated dependencies are skipped and can be misreported as unused.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/check-udeps.sh, line 19:

<comment>Add `--all-features` to the udeps invocation; otherwise feature-gated dependencies are skipped and can be misreported as unused.</comment>

<file context>
@@ -0,0 +1,19 @@
+  cargo install cargo-udeps --locked
+fi
+
+cargo +nightly udeps --workspace --all-targets "$@"
</file context>

Whole-engine perf sweep. Verified against existing bit-exact GEMM/
attention tests (595 pass; the 2 failing tests — kv_cache dtype sizing
and a Makefile/wasm check — are pre-existing and unrelated).

- flash_attention: dot_product_f32 avx512/avx2 use 4 independent
  accumulators + a 16/8-wide remainder loop, breaking the single-chain
  FMA latency bottleneck on short head_dim (64/96/128) loops; f16 dot
  uses 2 accumulators.
- flash_attention: vectorize f32 KvElem::axpy (decode V-accumulation)
  with AVX-512/AVX2 FMA instead of a scalar loop.
- tensor/kernels: add dot4_f32_avx512 / dot_f32_avx512 (16-wide) and
  dispatch the Q4_K/Q6_K decode-once GEMM and dot_f32_fast to them when
  avx512f+vl are present. Doubles dot lanes on the hottest quantized
  matmul path for AVX-512 hardware (Skylake-SP target).
- tensor/kernels: gemm_f32_cpu inner loop replaced its autovectorization-
  blocking black_box "prefetch" with SIMD dot_product_f32 over the
  contiguous transposed column.
- Cargo.toml: codegen-units = 1 for whole-crate inlining/LTO scope.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="oxidize-core/src/compute/tensor/kernels.rs">

<violation number="1" location="oxidize-core/src/compute/tensor/kernels.rs:559">
P1: AVX-512 dispatch check is weaker than callee target-feature requirements. Add `avx2` and `fma` to the runtime gate (or drop them from target_feature) to avoid illegal-instruction UB.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

debug_assert_eq!(a.len(), b.len());
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
if is_x86_feature_detected!("avx512f") && is_x86_feature_detected!("avx512vl") {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: AVX-512 dispatch check is weaker than callee target-feature requirements. Add avx2 and fma to the runtime gate (or drop them from target_feature) to avoid illegal-instruction UB.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At oxidize-core/src/compute/tensor/kernels.rs, line 559:

<comment>AVX-512 dispatch check is weaker than callee target-feature requirements. Add `avx2` and `fma` to the runtime gate (or drop them from target_feature) to avoid illegal-instruction UB.</comment>

<file context>
@@ -478,11 +478,87 @@ unsafe fn dot4_f32_avx2(
     debug_assert_eq!(a.len(), b.len());
     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
     {
+        if is_x86_feature_detected!("avx512f") && is_x86_feature_detected!("avx512vl") {
+            return unsafe { dot_f32_avx512(a.as_ptr(), b.as_ptr(), a.len()) };
+        }
</file context>
Suggested change
if is_x86_feature_detected!("avx512f") && is_x86_feature_detected!("avx512vl") {
if is_x86_feature_detected!("avx512f")
&& is_x86_feature_detected!("avx512vl")
&& is_x86_feature_detected!("avx2")
&& is_x86_feature_detected!("fma")
{

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant