A specialized hardware-accelerated CPU designed to execute the Brainfuck ISA. This project implements a custom Instruction Set Architecture (ISA) through a hardware/software co-design approach, featuring a SystemVerilog CPU core and a dedicated C-based assembler.
This project originated as an effort to understand CPU design at a fundamental level. During early research, I observed that Brainfuck's extremely small and well-defined instruction set makes it a natural candidate for a minimal, custom ISA.
Its simplicity exposes control-flow and memory semantics that are often hidden in higher-level architectures, while its large existing ecosystem of programs enables realistic and repeatable testing. These properties made Brainfuck an ideal vehicle for exploring hardware–software co-design.
The processor is a synchronous 8-bit machine utilizing a two-stage pipeline (FETCH, EXECUTE). Unlike software interpreters that rely on expensive runtime pointer-scanning for control flow, this architecture implements Constant-Time Control Flow by pre-calculating jump targets during the assembly phase.
For simplicity, this CPU implements a Harvard Architecture, characterized by physically separate storage and signal pathways for instructions and data.
- Hw/Sw Co-Design: Integrated toolchain that maps high-level logic to a bit-packed 20-bit instruction format.
-
Hardware Loop Acceleration: The CPU performs conditional jumps in a single cycle by utilizing pre-computed jump addresses, reducing
$O(n)$ bracket searches to$O(1)$ execution. - Cycle-Accurate Simulation: Verified using Verilator to compile SystemVerilog into a high-performance C++ model, bridging the gap between hardware description and software execution.
- Parametric Memory Mapping: Configurable tape size and instruction memory depth via SystemVerilog parameters.
The CPU implements a custom encoding scheme. Each instruction is 20 bits wide: 4 bits for the Opcode and 16 bits for the immediate jump target (used by [ and ]).
| Mnemonic | Opcode | Description |
|---|---|---|
OP_END |
0x0 |
Halts the CPU |
OP_INC |
0x1 |
Increment value at current pointer |
OP_DEC |
0x2 |
Decrement value at current pointer |
OP_NXT |
0x3 |
Move tape pointer forward |
OP_PRV |
0x4 |
Move tape pointer backward |
OP_OUT |
0x5 |
Synchronous data output (STDOUT) |
OP_INP |
0x6 |
Synchronous data input (STDIN) |
OP_FOR |
0x7 |
Jump to target if current cell is 0 |
OP_BAC |
0x8 |
Jump to target if current cell is non-zero |
During verification, the system was tested with complex computational workloads, including the Mandelbrot set generator.
- Simulation Overhead: While the hardware executes instructions with high efficiency, the Verilator simulation is cycle-accurate. This provides a perfect representation of hardware behavior at the cost of execution speed compared to abstract software interpreters.
- Resource Utilization: By utilizing a 20-bit instruction word, the design prioritizes execution speed and simplicity of the control unit over instruction density.
The design was verified using a cycle-accurate Verilator simulation harness.
- Cycles Executed: 1,000,000,000 (1 Billion)
- Simulated Frequency: 3.76 MHz
- Wall-Clock Time: 266.21 seconds
While simulation is limited by the host machine's serial modeling of RTL, the design is fully synthesizable. On a standard FPGA (e.g., Xilinx Artix-7), the estimated performance is:
- Target Frequency: 100 MHz
- Execution Time (1B Cycles): 10.0 seconds
- Speedup: ~26x faster than simulation
- Verilator (Hardware simulation)
- GCC/G++ (Toolchain and harness)
- Make
- Compile the Toolchain:
make
- Assemble Source to Machine Code:
./bf_asm path/to/program.bf > program.hex - Execute Simulation:
make run