Strings are stored in a centralized pool to eliminate redundancy and alignment padding overhead. They are referenced by StringId throughout the file.
StringId (u16): Zero-based index into the String Table.
StringId(0) is reserved and contains an easter egg: "Beauty will save the world" (Dostoevsky, The Idiot).
This reservation has a practical purpose: since Match instructions use 0 to indicate "no constraint" (wildcard), StringId(0) can never appear in unlinked bytecode instructions. User strings start at index 1.
Contains the raw UTF-8 bytes for all strings concatenated together.
- Section Offset: Computed (first section after header, at offset 64)
- Size:
header.str_blob_size - Content: Raw bytes. Strings are not null-terminated.
- Padding: The section is padded to a 64-byte boundary at the end.
Lookup table mapping StringId to byte offsets within the String Blob.
- Section Offset: Computed (follows RegexBlob, 64-byte aligned)
- Record Size: 4 bytes (
u32). - Capacity:
header.str_table_count + 1entries.- The table contains one extra entry at the end representing the total size of the unpadded blob.
To retrieve string i (where 0 <= i < header.str_table_count):
- Read
start = table[i] - Read
end = table[i+1] - Length =
end - start - Data =
blob[start..end]
// Logical layout (not a single struct)
struct StringTable {
offsets: [u32; header.str_table_count + 1],
}Limit: Maximum
str_table_countis 65,534 (0xFFFE). The table requirescount + 1entries for length calculation, and the extra entry must fit in addressable space.
Stored strings: "id", "foo"
String Blob:
0x00: 'i', 'd', 'f', 'o', 'o'
... padding to 64 bytes ...
String Table (str_table_count = 2):
0x00: 0 (Offset of "id")
0x04: 2 (Offset of "foo")
0x08: 5 (End of blob, used to calculate length of "foo")