This module provides architecture-independent assembly support for HolyCross, following a clean interface pattern that allows easy addition of new target architectures.
┌─────────────────────────────────────┐
│ Assembler Interface │
│ (Architecture-independent API) │
├─────────────────────────────────────┤
│ - parse(source) -> Instructions │
│ - encode(insts) -> MachineCode │
│ - getRegisterName(id) -> String │
│ - getRegisterId(name) -> ID │
└─────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌───────┴───┐ ┌──┴──┐ ┌──┴─────┐
│ x64 │ │ARM64│ │ RISC-V │
│ (current) │ │(todo)│ │ (todo) │
└───────────┘ └─────┘ └────────┘
src/assembler/
├── assembler.zig # Interface definition
├── x64.zig # x64/AMD64 implementation
├── tests/
│ └── assembler_tests.zig # Test suite
└── README.md # This file
- Interface Layer: Complete abstraction for architecture-independent code
- x64 Skeleton: Register definitions, name/ID lookup
- Type System: Operand types (register, immediate, memory, label)
- Instruction Structure: Generic instruction representation
- Test Suite: Comprehensive tests for the architecture
- x64 Parser: Parse assembly text into Instructions
- x64 Encoder: Generate x64 machine code from Instructions
- Full x64 Support: All common instructions (MOV, ADD, SUB, etc.)
- ARM64: Add ARM64 architecture support
- RISC-V: Add RISC-V architecture support
- Optimization: Instruction selection and optimization passes
const assembler = @import("assembler.zig");
// Create an x64 assembler
var x64 = assembler.X64Assembler.init(allocator);
defer x64.deinit();
// Get the architecture-independent interface
const asm_interface = x64.asAssembler();
// Use the interface (works for any architecture)
const reg_name = asm_interface.getRegisterName(0, .qword); // "RAX"
const reg_id = try asm_interface.getRegisterId("RBX"); // 3
// Parse assembly (future)
const instructions = try asm_interface.parse(
"MOV RAX, RBX\nPUSH RCX\n",
allocator
);
// Encode to machine code (future)
const machine_code = try asm_interface.encode(instructions, allocator);To add support for a new architecture (e.g., ARM64):
- Create the implementation file (
arm64.zig) - Define the register enum for your architecture
- Implement the required methods:
parse(source)- Parse assembly textencode(instructions)- Generate machine codegetRegisterName(id, size)- Register ID to namegetRegisterId(name)- Register name to IDdeinit()- Clean up resources
- Add tests in
tests/ - Export in
../assembler.zig
pub const ARM64Assembler = struct {
allocator: std.mem.Allocator,
pub fn init(allocator: std.mem.Allocator) ARM64Assembler {
return .{ .allocator = allocator };
}
pub fn deinit(self: *ARM64Assembler) void {
_ = self;
}
pub fn parse(self: *ARM64Assembler, source: []const u8, allocator: std.mem.Allocator) ![]Instruction {
// Implement ARM64 assembly parsing
}
pub fn encode(self: *ARM64Assembler, instructions: []Instruction, allocator: std.mem.Allocator) ![]u8 {
// Implement ARM64 machine code generation
}
pub fn getRegisterName(self: *ARM64Assembler, reg_id: u8, size: OperandSize) []const u8 {
// Return ARM64 register names (X0-X30, W0-W30, etc.)
}
pub fn getRegisterId(self: *ARM64Assembler, name: []const u8) !u8 {
// Parse ARM64 register names
}
pub fn asAssembler(self: *ARM64Assembler) Assembler {
return assembler.assembler(self);
}
};- Architecture Independence: Core compiler code should work with any target architecture
- Clean Interfaces: VTable-based polymorphism for runtime architecture selection
- Zero-Cost Abstraction: Interface overhead only at parse/encode boundaries
- Extensibility: Easy to add new architectures without modifying existing code
- Type Safety: Strong typing for operands, registers, and sizes
All x64 registers use a unified encoding scheme:
- ID 0-15: General-purpose registers (RAX-R15)
- Size determined by operand size:
.byte= 8-bit (AL, BL, etc.).word= 16-bit (AX, BX, etc.).dword= 32-bit (EAX, EBX, etc.).qword= 64-bit (RAX, RBX, etc.)
Example:
ID 0 + .qword = RAX
ID 0 + .dword = EAX
ID 0 + .word = AX
ID 0 + .byte = AL
Run the test suite:
zig build testTests cover:
- Interface abstraction
- Register name/ID lookups
- Architecture switching
- Parser and encoder functionality
The HolyCross compiler supports inline assembly using asm { ... } blocks, following TempleOS HolyC conventions.
I64 GetFortyTwo()
{
asm {
MOV RAX, 42
}
return 0; // Compiler-generated return (not using inline asm result)
}Inline assembly is "raw" - it emits machine code bytes directly without integration with the compiler's register allocator or calling conventions. This matches TempleOS's inline assembly model.
What works:
- ✅ Parse assembly instructions (MOV, PUSH, POP, NOP, etc.)
- ✅ Encode to x64 machine code bytes
- ✅ Emit as
.bytedirectives in generated assembly - ✅ Execute inline assembly within functions
Limitations:
⚠️ No register allocation coordination - The compiler doesn't know which registers you modify⚠️ No automatic save/restore - You must preserve registers according to calling conventions⚠️ No constraint syntax - Cannot specify inputs/outputs like GCC's extended asm⚠️ RAX is return value - If you modify RAX, the function will return that value⚠️ No compiler integration - Assembly is "opaque" to optimization and analysis
When writing inline assembly:
- Preserve callee-saved registers: RBX, R12-R15, RBP
- RAX is the return value - Last value in RAX will be the function's return
- Caller-saved registers can be freely modified: RAX, RCX, RDX, RSI, RDI, R8-R11
- Stack alignment: Maintain 16-byte alignment if calling functions
- No automatic spilling: The compiler won't save your values around the asm block
✅ Good: Using scratch registers
I64 AddFortyTwo(I64 x)
{
// x is passed in via calling convention (not visible in asm block)
asm {
MOV RBX, 42 // RBX is callee-saved, should be preserved!
ADD RAX, RBX // RAX is caller-saved, ok to modify
}
return x + 42; // Compiler handles actual addition
}❌ Bad: Clobbering without restore
I64 BadExample()
{
asm {
MOV RBX, 100 // Clobbers callee-saved RBX without restoring!
}
return 0;
}✅ Good: Proper save/restore
I64 GoodExample()
{
asm {
PUSH RBX // Save callee-saved register
MOV RBX, 100 // Use it
POP RBX // Restore before return
}
return 0;
}For full compiler integration (like GCC's extended inline assembly), we would need:
- Input/output constraints:
asm("mov %0, %1" : "=r"(output) : "r"(input)) - Clobber lists: Tell compiler which registers are modified
- Register allocation: Compiler assigns registers automatically
- Optimization awareness: Compiler can reason about asm blocks
These features are not currently planned, as the raw assembly model matches TempleOS conventions and is simpler to implement and understand.