A learning-focused SystemVerilog NPU prototype with Verilator-based regression tests.
Validated by CI/tests in sim/verilator:
- RTL modules for MAC, systolic array, controller/memory/engines scaffolding
- Verilator regression binaries:
test_mac_unittest_systolic_arraytest_npu_smoketest_integrationtest_gpt2_block
- Deterministic output harness (
make benchmark-deterministic)
- Full end-to-end LLM inference fidelity/performance
- Complete microcode/engine feature parity with architecture spec
- FPGA timing/resource closure and hardware bring-up
- Expanded lint/warning cleanup across all RTL modules
See docs/ARCHITECTURE.md and roadmap issues for details.
# Build simulation
cmake -S sim/verilator -B sim/verilator/build
cmake --build sim/verilator/build -j$(nproc)
# Run all tests
ctest --test-dir sim/verilator/build --output-on-failure
# Deterministic baseline harness
make benchmark-deterministicGitHub Actions runs three checks:
stable-regressionfull-ctestlint
Branch protection hardening guide: docs/CI_BRANCH_PROTECTION.md
tiny-npu/
├── rtl/ # SystemVerilog RTL
├── sim/verilator/ # Verilator testbenches + CMake
├── python/ # Model/data helper tooling
├── docs/ # Architecture + process docs
├── benchmarks/ # Deterministic benchmark harness + baseline
└── .github/workflows/ # CI
Read CONTRIBUTING.md before opening a PR.
MIT License - See LICENSE