Skip to content

SteganoSage/PersistX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PersistX

A production-grade educational storage engine built in modern C++17

C++17 CMake ACID ARIES B+ Tree License

PersistX demonstrates how real-world databases (PostgreSQL, MySQL/InnoDB, SQLite) manage persistent storage β€” from raw disk I/O to crash-safe ACID transactions β€” all in ~12,000 lines of hand-written C++.


✨ Highlights

  • Full ACID Transactions β€” begin, commit, and abort with strict atomicity and durability guarantees
  • ARIES Crash Recovery β€” three-phase protocol (Analysis β†’ Redo β†’ Undo) with CLR support for crash-safe aborts
  • B+ Tree Index β€” O(log N) point lookups, insertions, deletions, and range scans with leaf-linked-list traversal
  • Write-Ahead Logging β€” all modifications are logged before data pages are flushed; log is force-flushed on commit
  • Buffer Pool Manager β€” fixed-size in-memory page cache with LRU eviction and WAL-enforced dirty page writeback
  • Slotted Page Layout β€” variable-length records with stable RID = (page_id, slot_id) identifiers that survive compaction
  • Fuzzy Checkpointing β€” non-blocking checkpoint protocol that snapshots the Active Transaction Table (ATT) and Dirty Page Table (DPT)
  • Interactive Shell β€” feature-rich REPL with ANSI-colored output, box-drawn tables, B+ tree visualization, and benchmarking tools

πŸ—οΈ Architecture

PersistX follows a strict layered architecture with single-responsibility components and clean interface boundaries β€” the same design philosophy used by PostgreSQL, InnoDB, and BusTub.

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Interactive Shell      β”‚  REPL interface
                    β”‚   (ANSI terminal UI)     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Transaction Manager    β”‚  BEGIN / COMMIT / ABORT
                    β”‚   (ATT, CLR undo chain) β”‚  Crash-safe abort via CLRs
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                  β”‚                  β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   B+ Tree Index      β”‚ β”‚  WAL / Log     β”‚ β”‚  Checkpoint       β”‚
  β”‚   (search, insert,   β”‚ β”‚  Manager       β”‚ β”‚  Manager          β”‚
  β”‚    delete, range)    β”‚ β”‚  (append, flush)β”‚ β”‚  (ATT + DPT snap) β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                  β”‚
              β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              └──────►   Buffer Manager      β”‚  In-memory page cache
                     β”‚   (LRU eviction,      β”‚  WAL enforcement
                     β”‚    pin/unpin, flush)   β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚   Disk Manager        β”‚  Raw page I/O
                     β”‚   (read / write /     β”‚  File management
                     β”‚    allocate pages)    β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚   persistx.db         β”‚  Persistent storage
                     β”‚   persistx.log        β”‚  WAL file
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Summary

Component File(s) Responsibility
Slotted Page page.hpp/cpp Fixed 4 KB page layout with variable-length records, tombstone-aware slot directory, and O(1) space checks
Disk Manager disk_manager.hpp/cpp Thread-safe raw page I/O to a single database file
Buffer Manager buffer_manager.hpp/cpp, replacer.hpp/cpp In-memory page cache with LRU eviction, pin counting, and WAL-enforced writeback
Log Manager log_manager.hpp/cpp, log_record.hpp Append-only WAL with in-memory buffering, group flush, and LSN-indexed record lookup
Transaction Manager transaction_manager.hpp/cpp Full transaction lifecycle β€” BEGIN, COMMIT (force-flush), ABORT (CLR undo chain)
Checkpoint Manager checkpoint_manager.hpp/cpp Fuzzy checkpoints β€” snapshots ATT and DPT to the WAL without blocking transactions
Recovery Manager recovery_manager.hpp/cpp ARIES three-phase recovery: Analysis, Redo, Undo with CLR skip logic
B+ Tree Index btree_index.hpp/cpp, btree_page.hpp/cpp Ordered index with leaf/internal page layouts, recursive split, and linked-leaf range scan
Shell shell.hpp/cpp, shell_commands.cpp Interactive REPL with 20+ commands, ANSI rendering, and benchmark tools
Terminal terminal.hpp/cpp ANSI color helpers, box-drawing, table rendering, and Windows VT100 setup

πŸš€ Getting Started

Prerequisites

  • C++17 compiler (GCC 9+, Clang 10+, or MSVC 2019+)
  • CMake 3.20 or newer
  • pthreads (Linux/macOS) or Windows threads

Build

# Clone the repository
git clone https://github.com/yourusername/PersistX.git
cd PersistX

# Configure and build
cmake -S . -B build -G "MinGW Makefiles"    # Windows with MinGW
# cmake -S . -B build                       # Linux / macOS
cmake --build build

# Run the interactive shell
./build/persistx.exe          # Windows
# ./build/persistx            # Linux / macOS

Run Tests

PersistX ships with a comprehensive test suite covering every layer:

./build/test_page.exe              # Slotted page layout tests
./build/test_disk_manager.exe      # Disk I/O tests
./build/test_buffer_manager.exe    # Buffer pool + eviction tests
./build/test_wal.exe               # Write-ahead logging tests
./build/test_transaction.exe       # Transaction ACID tests
./build/test_checkpoint.exe        # Checkpoint protocol tests
./build/test_recovery.exe          # ARIES crash recovery tests
./build/test_btree.exe             # B+ tree index tests
./build/overall_tests.exe          # Full integration test suite

πŸ’» Interactive Shell

PersistX includes a beautifully rendered interactive REPL that exposes every layer of the storage engine:

  ╔════════════════════════════════════════════════════════════════════╗
  β•‘       β—†  P E R S I S T X   S t o r a g e   E n g i n e  β—†      β•‘
  β•‘                  v0.1.0  |  ACID  |  ARIES  |  B+Tree            β•‘
  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

  Type help for available commands.

  persistx>

Command Reference

Data Operations

Command Description
put <key> <value> Insert a key-value pair into the B+ tree
get <key> Look up a value by key
del <key> Delete a key-value pair
update <key> <value> Update the value for an existing key
scan <begin> <end> Range scan β€” returns all keys in [begin, end]

Transactions

Command Description
begin Start an explicit transaction
commit Commit the current transaction (force-flush WAL)
abort Abort and rollback via CLR undo chain

Diagnostics

Command Description
status System overview β€” buffer pool, WAL, B+ tree stats
tree Visualize the B+ tree structure level by level
buffer Inspect buffer pool frame table
wal [n] Show last N WAL log entries
page <id> Inspect a specific page (data or index)

System

Command Description
flush Flush all dirty pages to disk
checkpoint Trigger a fuzzy checkpoint
fill <n> Bulk insert N sequential keys (benchmarking)
bench <n> Benchmark N insert + lookup operations
crash Simulate a crash (_Exit(1) β€” no flush!)
recover Run ARIES recovery + rebuild B+ tree index
clear Clear the terminal screen
exit Shut down gracefully

Example Session

  persistx> put 1 "Hello, World!"
  βœ“ INSERT  key=1  β†’  RID(2,0)  [LSN=1]

  persistx> put 2 "Database Internals"
  βœ“ INSERT  key=2  β†’  RID(2,1)  [LSN=3]

  persistx> get 1
  1 β†’ Hello, World!

  persistx> begin
  β”‚ BEGIN  txn#2

  persistx[txn#2]> put 3 "ACID transactions"
  βœ“ INSERT  key=3  β†’  RID(2,2)  [LSN=5]

  persistx[txn#2]> abort
  βœ— ABORT  txn#2

  persistx> get 3
  Key 3 not found

  persistx> crash
  πŸ’₯ CRASH β€” killing process without flushing!

  persistx> recover
  πŸ”„ Running ARIES recovery...
  βœ“ ARIES complete (2.35ms)
  πŸ”§ Rebuilding B+ tree index...
  βœ“ Index rebuilt (2 keys recovered)
  βœ“ Recovery complete (4.12ms total)

🧠 Technical Deep Dive

Page Layout (Slotted Pages)

Every page is exactly 4096 bytes, aligned for direct I/O:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Header (21 bytes)                                    β”‚
β”‚  page_id (4) β”‚ type (1) β”‚ slots (2) β”‚ tombstones (2) β”‚
β”‚  free_ptr (4) β”‚ page_lsn (8)                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Record Data (grows ↓ from top)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚  β”‚ Rec0 β”‚ β”‚   Rec1     β”‚ β”‚ Rec2 β”‚ ...               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                    ↕ free space ↕                    β”‚
β”‚  [Slot2][Slot1][Slot0]  ← Slot Directory (grows ↑)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Tombstone-aware: deleted slots are marked (not removed), enabling O(1) can_insert() checks and stable RIDs
  • Compaction: live records are rewritten contiguously while preserving slot indices β€” RIDs remain valid

B+ Tree Index

  • Leaf pages: sorted array of (int64_t key, RID) entries β€” max 291 entries/leaf
  • Internal pages: child[0] | key[0] | child[1] | ... | child[N] β€” max 340 keys/node
  • Leaf linked list: leaves are connected via next_leaf_id for efficient range scans
  • Lens pattern: BTreeLeafPage and BTreeInternalPage are zero-cost wrappers over raw Page* buffers β€” no extra allocation
  • No merge on delete: same policy as PostgreSQL β€” deleted entries are removed without rebalancing

Write-Ahead Logging (WAL)

Every modification follows the LOG β†’ FLUSH β†’ DATA WRITE protocol:

  1. Serialize the log record (before-image + after-image) into the in-memory buffer
  2. Force-flush the log to disk before any dirty data page can be written
  3. The BufferManager::enforce_wal() method guarantees this invariant on every eviction

Log record types: BEGIN, UPDATE, COMMIT, ABORT, TXN_END, CLR, CHECKPOINT_BEGIN, CHECKPOINT_END

ARIES Recovery

On every startup (clean or crash), PersistX runs the full ARIES protocol:

Phase What It Does
Analysis Scan WAL from the last checkpoint to reconstruct the ATT and DPT
Redo Replay all logged changes β€” idempotent thanks to LSN comparison
Undo Roll back uncommitted transactions using the prevLSN chain, writing CLRs for crash safety

After ARIES, the B+ tree index is rebuilt by scanning recovered data pages β€” ensuring total consistency even after an unclean shutdown.

Transaction Abort (CLR Chain)

Abort walks the undo chain backward (prevLSN pointers), restoring before-images:

UPDATE(LSN=5) ← prevLSN ← UPDATE(LSN=3) ← prevLSN ← BEGIN(LSN=1)
    β”‚                          β”‚
    β–Ό                          β–Ό
CLR(LSN=7, undo_next=3)   CLR(LSN=8, undo_next=1)

Each CLR records which step was already undone, so a crash mid-abort can safely skip completed undo steps.


πŸ“ Project Structure

PersistX/
β”œβ”€β”€ include/                    # Header files (public API)
β”‚   β”œβ”€β”€ common.hpp              # Shared types, constants, RID
β”‚   β”œβ”€β”€ page.hpp                # Slotted page layout
β”‚   β”œβ”€β”€ disk_manager.hpp        # Raw page I/O
β”‚   β”œβ”€β”€ replacer.hpp            # LRU eviction policy
β”‚   β”œβ”€β”€ buffer_manager.hpp      # In-memory page cache
β”‚   β”œβ”€β”€ log_manager.hpp         # WAL management
β”‚   β”œβ”€β”€ log_record.hpp          # Log record serialization
β”‚   β”œβ”€β”€ transaction.hpp         # Transaction state
β”‚   β”œβ”€β”€ transaction_manager.hpp # Transaction lifecycle
β”‚   β”œβ”€β”€ checkpoint_manager.hpp  # Fuzzy checkpoints
β”‚   β”œβ”€β”€ recovery_manager.hpp    # ARIES recovery
β”‚   β”œβ”€β”€ btree_page.hpp          # B+ tree node layouts
β”‚   β”œβ”€β”€ btree_index.hpp         # B+ tree public API
β”‚   β”œβ”€β”€ shell.hpp               # Interactive shell
β”‚   └── terminal.hpp            # ANSI terminal rendering
β”œβ”€β”€ src/                        # Implementation files
β”‚   β”œβ”€β”€ main.cpp                # Entry point
β”‚   β”œβ”€β”€ page.cpp                # Slotted page operations
β”‚   β”œβ”€β”€ disk_manager.cpp        # File I/O
β”‚   β”œβ”€β”€ replacer.cpp            # LRU eviction
β”‚   β”œβ”€β”€ buffer_manager.cpp      # Buffer pool management
β”‚   β”œβ”€β”€ log_manager.cpp         # WAL append / flush / read
β”‚   β”œβ”€β”€ transaction_manager.cpp # BEGIN / COMMIT / ABORT + CLR
β”‚   β”œβ”€β”€ checkpoint_manager.cpp  # Checkpoint protocol
β”‚   β”œβ”€β”€ recovery_manager.cpp    # Analysis / Redo / Undo
β”‚   β”œβ”€β”€ btree_page.cpp          # Leaf & internal page ops
β”‚   β”œβ”€β”€ btree_index.cpp         # Tree search / insert / split
β”‚   β”œβ”€β”€ shell.cpp               # REPL loop + data commands
β”‚   β”œβ”€β”€ shell_commands.cpp      # Diagnostics & system commands
β”‚   └── terminal.cpp            # ANSI rendering utilities
β”œβ”€β”€ tests/                      # Test suite
β”‚   β”œβ”€β”€ test_page.cpp           # Page layout correctness
β”‚   β”œβ”€β”€ test_disk_manager.cpp   # Disk I/O + allocation
β”‚   β”œβ”€β”€ test_buffer_manager.cpp # Eviction, pinning, WAL
β”‚   β”œβ”€β”€ test_wal.cpp            # Log append / flush / read
β”‚   β”œβ”€β”€ test_transaction.cpp    # ACID transaction tests
β”‚   β”œβ”€β”€ test_checkpoint.cpp     # Checkpoint correctness
β”‚   β”œβ”€β”€ test_recovery.cpp       # Crash recovery scenarios
β”‚   β”œβ”€β”€ test_btree.cpp          # B+ tree operations
β”‚   └── overall_tests.cpp       # End-to-end integration
β”œβ”€β”€ CMakeLists.txt              # Build configuration
└── README.md

🎯 Design Principles

  1. Page-based storage β€” all data lives in fixed 4 KB pages for O_DIRECT compatibility
  2. Separation of concerns β€” strict layered architecture with zero upward dependencies
  3. Write-ahead logging β€” LOG β†’ FLUSH β†’ DATA WRITE, always
  4. Crash-first design β€” the system is always recoverable to a consistent state
  5. Stable record identifiers β€” RID = (page_id, slot_id) survives compaction and page reorganization
  6. Zero-copy lenses β€” B+ tree pages are interpreted in-place, no deserialization overhead
  7. Thread safety β€” mutex-protected critical sections with documented latch ordering: TxnMgr β†’ BufferMgr β†’ LogMgr
  8. No UB β€” all wire format access uses memcpy instead of reinterpret_cast to avoid strict-aliasing violations

πŸ“š Inspired By


πŸ“„ License

This project is for educational purposes. Built as part of the IEEE NITK Envision 2026 program.

About

A storage engine built from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors