Vela is a statically-typed, compiled programming language targeting a custom 16-bit ARM-like CPU architecture. It is a companion project to the DE1 CPU ISA - Vela compiles .vl source files into .de1 assembly that the CPU's encoder assembles into machine code for simulation on the Verilog hardware model.
- Quick Start
- Project Structure
- Language Features
- Standard Library
- Compiler Pipeline
- Code Generation
- Optimization Passes
- Examples
- Testing
Requirements: Python 3.12+
# Compile a Vela program
python -m src.main examples/hello.vl -o examples/hello.de1
# Run the generated assembly on the CPU simulator (requires DE1 CPU project)
cd ../CPU
python run.py ../Vela/examples/hello.de1Output on success: [velac] Compiled examples/hello.vl -> examples/hello.de1
Vela/
├── src/ # Compiler implementation
│ ├── main.py # CLI entry point
│ ├── errors.py # Error types and reporting
│ ├── lexer/ # Tokenization
│ │ ├── lexer.py
│ │ └── tokens.py # Token kinds and Token dataclass
│ ├── parser/ # AST construction
│ │ ├── parser.py # Recursive-descent parser
│ │ └── ast_nodes.py # AST node classes
│ ├── semantic/ # Type checking and symbol resolution
│ │ ├── type_checker.py # Single-pass type checker
│ │ ├── types.py # Type definitions (IntType, PtrType_, ClassType, ...)
│ │ ├── scope.py # Scope stack and symbol table
│ │ ├── tag_processor.py # [[get]]/[[set]] tag expansion
│ │ ├── module_resolver.py # Import resolution
│ │ └── inheritance.py # Inheritance validation and vtable construction
│ ├── ir/ # Intermediate representation
│ │ ├── instructions.py # IR instruction classes and IROp enum
│ │ ├── builder.py # AST -> IR lowering
│ │ └── virtual_register.py # Virtual register allocation
│ ├── optimizer/ # IR and ASM optimization passes
│ │ ├── constant_folder.py
│ │ ├── strength_reduction.py
│ │ ├── dead_code.py
│ │ ├── devirtualizer.py # VCALL -> CALL when type is statically known
│ │ ├── inliner.py # Inline small functions (≤32 instrs)
│ │ ├── escape_analysis.py # SROA: heap allocs -> scalar registers
│ │ └── peephole.py # ASM-level peephole optimizer
│ └── codegen/ # Assembly code emission
│ ├── asm_emitter.py # Main code emitter
│ ├── calling_convention.py # Prologue/epilogue and ABI
│ ├── register_allocator.py # Virtual -> physical register mapping
│ ├── memory_layout.py # Data section layout
│ └── runtime.py # Runtime library (__malloc, __free, __vdispatch, __syscall)
├── stdlib/ # Standard library (Vela source)
│ ├── core/
│ │ └── storeable.vl # Base class for all objects
│ ├── types/
│ │ ├── int.vl # Int wrapper class
│ │ ├── float.vl # Float wrapper class
│ │ ├── bool.vl # Bool wrapper class
│ │ ├── char.vl # Char wrapper class
│ │ ├── string.vl # String class (length-prefixed)
│ │ ├── array.vl # Dynamic array
│ │ ├── matrix.vl # 2D matrix (uses native ISA ops)
│ │ └── null.vl # NULL alias
│ └── math.vl # Math utilities (Abs, Min, Max, Pow, ...)
├── examples/ # Sample programs with compiled .de1 output
│ ├── hello.vl
│ ├── factorial.vl
│ ├── polymorphism.vl
│ ├── linked_list.vl
│ └── boxed_values.vl # Boxed wrapper tests (Int, Bool)
├── tests/ # Pytest test suite
│ ├── test_lexer.py
│ ├── test_parser.py
│ ├── test_semantic.py
│ ├── test_integration.py
│ ├── test_compiler.py
│ └── test_autobox_optimizer.py # Devirtualizer, inliner, escape analysis tests
└── pyproject.toml
| Type | Description | Size |
|---|---|---|
U0 |
Void | 0 bytes |
U8 |
Unsigned 8-bit integer | 1 byte |
I8 |
Signed 8-bit integer | 1 byte |
U16 |
Unsigned 16-bit integer | 2 bytes |
I16 |
Signed 16-bit integer | 2 bytes |
F16 |
IEEE-754 half-precision | 2 bytes |
Ptr<I16> p = &x; // Pointer to I16
Ptr<Circle> obj = null; // Pointer to class instance
if (x > 0) { ... } else { ... } // Condition must be Bool
while (i < 10) { i++; } // Condition must be Bool
for (I16 i = 0; i < n; i++) { ... }
ret value; // Return from function
Vela supports single inheritance, virtual dispatch via vtables, and automatic Storeable base class injection.
class Animal {
I16 legs;
OnAlloc(I16 l) {
legs = l;
}
OnFree() { }
I16 getLegCount() {
ret legs;
}
}
class Dog : Animal {
I16 barkVolume;
OnAlloc(I16 l, I16 vol) {
legs = l;
barkVolume = vol;
}
}
OnAlloc- Constructor, called onInit<Class>(args...)OnFree- Destructor, called onFree(ptr). Falls back to the defaultStoreableimplementation if omitted.
type Drawable {
skeleton U0 draw();
skeleton I16 getArea();
}
Fields can have [[get]], [[set]], or [[get,set]] tags that auto-generate PascalCase accessor methods:
class Point {
[[get,set]] I16 x;
[[get]] I16 y;
}
// Generates: GetX(), SetX(I16), GetY()
The compiler emits an error if you define a method that conflicts with a tag-generated accessor.
Methods and field access are only available on boxed wrapper types, not on bare primitives. Using a class name as a variable type (e.g. Int x = 42) declares a heap-allocated wrapper that auto-boxes the primitive value:
Int x = 42; // Auto-boxes 42 into a heap-allocated Int wrapper
x.Abs(); // ✅ Works: Int has method Abs()
x.IsPositive(); // ✅ Works: Int has method IsPositive()
Free(x); // Free the wrapper when done
I16 y = 42;
y.Abs(); // ❌ Error: primitive type 'I16' has no methods;
// use the boxed type 'Int' instead
The same applies to all wrapper types:
Bool flag = true; // Auto-boxes into Bool (only true/false or comparisons allowed)
flag.Not(); // ✅ Works
Float f = 3.14; // Auto-boxes into Float
f.Negate(); // ✅ Works
Bool is a class (from stdlib/types/bool.vl), not a primitive alias. It is a distinct type at compile-time with the same runtime footprint as U8. It is NOT implicitly convertible to/from integers:
Bool a = true; // ✅ OK: boolean literal
Bool b = (x > 0); // ✅ OK: comparison produces Bool
Bool c = 1; // ❌ Error: cannot initialise Bool from I16
I16 y = true; // ❌ Error: cannot assign Bool to I16
U8 x = 200;
if (x) { ... } // ❌ Error: if condition must be Bool, got U8
if (x != 0) { ... } // ✅ OK: comparison produces Bool
Note: The compiler's optimizer (devirtualizer -> inliner -> escape analysis) can eliminate the boxing overhead entirely, replacing heap allocations with scalar register operations.
Manual allocation with a free-list allocator:
Ptr<Circle> c = Init<Circle>(5); // Allocate + construct
I16 area = c.getArea();
Free(c); // Deallocate + destruct
import stdlib::types::{int}; // Import specific module
import stdlib::types::{int, float, array}; // Import multiple modules
import stdlib::types::{*}; // Wildcard: import all from package
import stdlib::math::{*}; // Math utilities
Resolution: import pkg::sub::{mod} -> <project_root>/pkg/sub/mod.vl
Storeable from stdlib/core/ is auto-imported into every compilation unit.
Ptr<T> // Pointer parameterized by type
Init<ClassName>() // Generic class instantiation
SizeOf(Type) // Compile-time size query
Cast<Type>(expr) // Explicit type cast
| Function | Description |
|---|---|
Print(value) |
Debug print (syscall) |
Malloc(size) |
Allocate size bytes on the heap |
Free(ptr) |
Free heap memory (calls OnFree if any) |
Init<T>(args...) |
Allocate + construct a class instance |
SizeOf(Type) |
Byte size of a type |
Cast<T>(expr) |
Explicit type cast |
I16 result;
ASM {
[[in]] R0 = x;
[[in]] R1 = y;
ADD R0, R0, R1
[[out]] result = R0;
}
| Category | Operators |
|---|---|
| Arithmetic | +, -, *, /, % |
| Comparison | ==, !=, <, >, <=, >= |
| Logical | &&, ||, ! |
| Assignment | =, +=, -=, *=, /= |
| Unary | ++, --, & (address-of) |
| Access | . (member), [] (index) |
Implicit base class for all objects:
I16 GetSize()- object size in bytesI16 Pointer()- object addressI16 Reference()- reference to object
| Class | Wraps | Key Methods |
|---|---|---|
Int |
I16 | Abs(), Negate(), IsPositive(), IsNegative(), IsZero(), Add(), Sub(), Mul(), Equals(), MinWith(), MaxWith(), Clamp() |
Float |
F16 | Abs(), Negate(), IsPositive(), IsNegative(), IsZero(), Add(), Sub(), Mul(), Div(), Equals(), GreaterThan(), LessThan() |
Bool |
BoolType | Not(), And(), Or(), Xor(), ToInt(), Equals() — distinct compile-time type, only accepts true/false/comparisons |
Char |
U8 | IsAlpha(), IsDigit(), IsUpper(), IsLower(), IsSpace(), ToUpper(), ToLower(), ToInt(), Equals() |
String |
Ptr+len | IsEmpty(), CharAt(), Equals(), Contains(), IndexOf() |
Array |
heap | Get(), Set(), Push(), Pop(), IsEmpty(), First(), Last(), Contains(), IndexOf(), Fill(), Clear(), Sum() |
Matrix |
heap | Get(), Set(), Size(), IsSquare(), Fill(), Sum(), Trace(), MulWith(), AddScalar(), Scale() |
Module-level functions: Abs(), Min(), Max(), Clamp(), Pow(), Sign()
Source (.vl)
|
+-- Lexer ------------ Tokenization (keywords, operators, literals, types)
|
+-- Parser ----------- Recursive-descent -> AST
|
+-- Semantic Analysis Type checking, symbol resolution, import loading,
| vtable construction, tag expansion,
| boxed-type enforcement, Bool type safety
|
+-- IR Generation ---- AST -> three-address code (85+ IR operation types)
|
+-- Optimizer -------- constant_fold -> strength_reduce -> devirtualize ->
| inline -> constant_fold -> escape_analyze -> DCE
|
+-- Code Generation -- IR -> assembly with register allocation,
| calling convention, memory layout
|
+-- Peephole --------- ASM-level micro-optimizations
|
+-- Output (.de1)
The compiler targets the custom 16-bit CPU defined in DE1:
- 16-bit registers R0-R14 (R13 = SP, R14 = LR), separate PC
- 32-bit fixed-length instructions
- 64 KB byte-addressable RAM (little-endian for 16-bit stores)
- 12-bit immediate max (values > 4095 use a constant pool)
- No indexed addressing - computed via
ADD+MOVM/SAVEM - Flags set only by CMP, CMN, TST, TEQ, FCMP (no
Ssuffix on ALU ops)
| Register | Role |
|---|---|
| R0-R3 | Arguments and return value (caller-saved) |
| R4-R10 | Callee-saved |
| R11 | Frame pointer |
| R12 | Scratch / vtable dispatch |
| R13 | Stack pointer (starts at 0xFFFF, grows down) |
| R14 | Link register |
Stack frame layout (high to low address):
[R14 saved] [R11/FP saved] [callee-saved R4-R10] [locals] <- SP
The compiler emits four runtime functions appended after user code:
| Function | Purpose |
|---|---|
__syscall |
System call interface (R0 = param, R1 = syscall ID) |
__vdispatch |
Virtual method dispatch (MOV PC, R12) |
__malloc |
Free-list heap allocator with stack-heap collision detection |
__free |
Heap deallocation, coalesces adjacent free blocks |
Address 0x0000: Data section (heap_start, free_list_head, syscall vars, vtables, strings, constants)
Address N: main: (trampoline: init heap -> init vtables -> BL __entry_main -> B __program_end)
Address M: User functions
Address P: Runtime functions (__syscall, __vdispatch, __malloc, __free)
Address Q: __program_end: (null word -> CPU halts)
Evaluates constant expressions at compile time: 2 + 3 -> 5. Control-flow aware — clears known constants at label join points to avoid incorrect folding across branches.
Replaces expensive operations with cheaper equivalents:
x * 4->x << 2x / 8->x >> 3x * 0->0,x + 0->x,x * 1->x
Rewrites virtual calls (VCALL) to direct calls (CALL) when the receiver's concrete type can be statically inferred from the __malloc + vtable-store pattern.
Inlines non-recursive functions with ≤32 instructions at their call sites, renaming registers and labels to avoid conflicts.
Identifies heap-allocated objects whose pointers do not escape the current function. For non-escaping objects, replaces field loads/stores with scalar register operations and eliminates the associated __malloc/__free calls. This is the key pass that eliminates auto-boxing overhead.
Removes assignments whose results are never read.
- Self-move elimination:
MOV Rx, Rx-> removed - Immediate folding:
MOV Rx, Vimmfollowed by use -> fold immediate - MOV chain collapse:
MOV Rx, src+MOV Ry, Rx->MOV Ry, src - Conditional execution (ARM-style predication): replaces branch-over patterns with predicated instructions, eliminating branches and pipeline flushes:
- If-else diamond:
Bcond; body1; B skip; target:; body2; skip:->body1{inv_cond}; body2{cond}(zero branches) - If-only:
Bcond target; body; target:->body{inv_cond}(zero branches) - MAX/MIN collapse:
CMP Ra, Rb; MOVGT Rd, Ra; MOVLE Rd, Rb->MAX Rd, Ra, Rb(1 instruction) - Orphan label cleanup: removes unreferenced compiler-generated labels
- If-else diamond:
module hello {
U0 main() {
I16 x = 42;
Print(x);
ret;
}
}
Simulator result: R0 = 42 ✓
module factorial {
I16 factorial(I16 n) {
if (n <= 1) { ret 1; }
ret n * factorial(n - 1);
}
I16 main() {
I16 result = factorial(5);
Print(result);
ret result;
}
}
Simulator result: R0 = 120 ✓
Demonstrates class inheritance, virtual dispatch via vtables, and the type system with skeleton methods. Two classes (Circle and Square) implement a Shape interface.
Simulator result: R0 = 91 ✓
Manual memory management with Init<> / Free(), pointer-based linked list traversal, and heap allocation.
Simulator result: R0 = 30 ✓
Tests auto-boxing with boxed wrapper types across Int (Abs, Add, IsZero, MaxWith, Clamp) and Bool (Not, using true/false literals). Demonstrates that the optimizer eliminates all heap allocations - the generated .de1 contains zero __malloc/__free calls in the main function body.
# Run all tests (285 tests)
pytest tests/
# Run specific test module
pytest tests/test_lexer.py -v
pytest tests/test_parser.py -v
pytest tests/test_semantic.py -v
pytest tests/test_integration.py -v
pytest tests/test_autobox_optimizer.py -vTest modules cover: tokenization, AST construction, type checking, inheritance validation, import resolution, end-to-end compilation, and auto-boxing optimizer passes (devirtualization, inlining, escape analysis).