Skip to content

En3nr4/Kagura-StackWalker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kagura-StackWalker

.pdata Stack Frame Reconstruction Engine

Kagura (神楽): sacred dance of the Shinto gods.
The stack walk is a dance the EDR performs frame by frame.
Kagura teaches you to replicate that choreography perfectly.


Live reference tool for malware developers and red teamers. Loads Windows system DLLs, parses .pdata sections, decodes every UNWIND_INFO, and reconstructs the exact stack frame layout for any function. So you know precisely what your fake stack needs to look like.

Why Kagura?

I built Kagura because I needed to see what the stacks I was trying to reconstruct actually looked like. While developing indirect syscall and callstack spoofing techniques, I spent hours in WinDbg running .fnent, parsing dumpbin /unwindinfo output, and doing manual hex arithmetic to figure out frame sizes and register offsets. Every mistake meant a dead process with no clear error message from CrowdStrike.

The problem is simple: when you forge a fake stack, you need to know the exact layout that the EDR's stack walker expects. CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne: they all replay the .pdata unwind codes frame by frame to validate every return address, every saved register, every RSP alignment. Get a single slot wrong and your stack gets flagged.

Kagura makes that layout visible. It loads the target modules, decodes every RUNTIME_FUNCTION and its UNWIND_INFO, and shows you the precise slot-by-slot layout with forge instructions. For every function. Across 7 system DLLs. In a searchable interactive TUI.

No more guessing. No more manual parsing. No more trial and error.

Screenshots

  KAGURA-STACKWALKER v1.0.0                          [7 modules | 20718 funcs]

  kernelbase.dll!CreateFileA (0x000045AB0)
  Size: 0x80 (128B) | Prolog: 31B | Unwind codes: 12 | Chain depth: 0
  ============================================================================

  RSP+0x0098 | Saved R12 (MOV @0x98)
  RSP+0x0090 | Saved RDI (MOV @0x90)
  RSP+0x0088 | Saved RSI (MOV @0x88)
  RSP+0x0080 | Saved RBX (MOV @0x80)
  RSP+0x0078 | Return address (@0x78)              <- caller RIP
  RSP+0x0070 | Saved RBP (PUSH)
  RSP+0x0068 | Saved R14 (PUSH)
  RSP+0x0060 | Saved R15 (PUSH)
  RSP+0x0000 | Local/Shadow (0x60)
  RSP+0x0000 | [current RSP]

  +-- Registers saved ----------------------+
  | R12    @ RSP+0x0098  (save)             |
  | RDI    @ RSP+0x0090  (save)             |
  | RSI    @ RSP+0x0088  (save)             |
  | RBX    @ RSP+0x0080  (save)             |
  | R15    @ RSP+0x0060  (push)             |
  | R14    @ RSP+0x0068  (push)             |
  | RBP    @ RSP+0x0070  (push)             |
  +-----------------------------------------+

  Exception handler: No

Each slot shows:

  • Offset from RSP: exact position in the frame
  • Content type: return address, saved register, local/shadow space, XMM save
  • Save method: PUSH vs MOV (critical for accurate fake stack reconstruction)

Installation

Prerequisites

  • Windows 10 1809+ or Windows 11 (x64 only)
  • Visual Studio 2022 with the "Desktop development with C++" workload
  • CMake 3.20+ (included with Visual Studio 2022)
  • Git with submodule support

Step 1: Clone the repository

git clone --recursive https://github.com/En3nr4/Kagura-StackWalker.git
cd Kagura-StackWalker

If you already cloned without --recursive, initialize submodules manually:

git submodule update --init --recursive

Step 2: Open a Developer PowerShell

Open "Developer PowerShell for VS 2022" from the Start menu. This ensures cmake and cl.exe are in your PATH.

Alternatively, add CMake to your current PowerShell session:

$env:PATH += ";C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin"

Step 3: Build

Release build (recommended):

cmake --preset release
cmake --build build/release --config Release

The binary is at build\release\Release\kagura.exe.

Debug build:

cmake --preset debug
cmake --build build/debug --config Debug

Step 4: Run

.\build\release\Release\kagura.exe

You can copy kagura.exe anywhere: it's a single standalone binary with no external dependencies (CRT is statically linked).

Usage

Interactive TUI (default)

kagura.exe

Launches the interactive terminal UI. Navigate modules, search functions, inspect frame layouts.

CLI Batch Mode

# Dump all frames from ntdll to console
kagura.exe --dump --module ntdll.dll

# Dump a specific module
kagura.exe --dump --module kernel32.dll

# Export a specific function to JSON
kagura.exe --export json --module ntdll.dll --func NtCreateFile -o ntcreate.json

# Export entire module to JSON
kagura.exe --export json --module ntdll.dll -o ntdll_frames.json

# Export all modules to JSON
kagura.exe --export json -o all_frames.json

# Show help
kagura.exe --help

TUI Navigation

Key Action
UP / DOWN Navigate list
Enter Select module / Open function detail
/ Search (incremental, case-insensitive)
Esc Back / Cancel search
Backspace Back to previous view
PgUp / PgDn Scroll fast (20 items)
e Export current frame to JSON file
q Quit

Search Tips

  • Type /NtCreate to find all NtCreate* functions
  • Type /0x followed by an RVA to search by address
  • Search is case-insensitive and matches anywhere in the function name
  • Press Esc to clear the search filter
  • Press Enter to keep the filter active and navigate results

Modules Loaded

Kagura automatically loads and analyzes these system DLLs at startup:

Module Typical Use in Maldev
ntdll.dll Syscall stubs, Nt/Zw functions, Rtl* utilities
kernel32.dll Win32 API layer, BaseThreadInitThunk (terminal frame)
kernelbase.dll Actual implementations behind kernel32
user32.dll Window/message functions
win32u.dll Win32k syscall stubs
advapi32.dll Security, registry, service functions
msvcrt.dll C runtime (common in legitimate stacks)

Understanding the Output

Frame Layout

The frame layout shows the exact memory layout of a function's stack frame as seen by RtlVirtualUnwind. This is what the EDR validates during a stack walk.

RSP+0x0078 | Return address (@0x78)        <- Where the caller's RIP is stored
RSP+0x0070 | Saved RBP (PUSH)              <- push rbp instruction
RSP+0x0068 | Saved R14 (PUSH)              <- push r14 instruction  
RSP+0x0060 | Saved R15 (PUSH)              <- push r15 instruction
RSP+0x0000 | Local/Shadow (0x60)           <- sub rsp, 0x60
RSP+0x0000 | [current RSP]                 <- Stack pointer after prologue

PUSH vs MOV

This distinction is critical for fake stack construction:

  • PUSH: The register was saved via push reg. The offset is sequential (each push adds 8 bytes to the frame).
  • MOV (save): The register was saved via mov [rsp+offset], reg. The offset is explicit and can be anywhere in the frame.

When forging a stack, PUSH-saved registers must appear in the correct sequential order, while MOV-saved registers just need to be at the right offset.

Syscall Stubs

Functions like NtCreateFile, NtAllocateVirtualMemory etc. in ntdll are syscall stubs with minimal or zero unwind codes. Their frame is typically just 8 bytes (return address only). This is correct: these functions do mov r10, rcx; mov eax, SSN; syscall; ret with no stack frame setup.

JSON Output Format

{
  "module": "kernelbase.dll",
  "base_address": "0x7FFE7F200000",
  "function": {
    "name": "CreateFileA",
    "display_name": "CreateFileA (0x000045AB0)",
    "rva": "0x000045AB0",
    "size": 128,
    "prolog_size": 31,
    "frame": {
      "total_size": 128,
      "return_addr_offset": "0x78",
      "frame_register": null,
      "frame_reg_offset": 0,
      "slots": [
        {"offset": "0x0", "size": 8, "type": "LOCAL_ALLOC", "label": "Local/Shadow (0x60)"},
        {"offset": "0x60", "size": 8, "type": "SAVED_REG", "label": "Saved R15 (PUSH)"},
        {"offset": "0x68", "size": 8, "type": "SAVED_REG", "label": "Saved R14 (PUSH)"},
        {"offset": "0x70", "size": 8, "type": "SAVED_REG", "label": "Saved RBP (PUSH)"},
        {"offset": "0x78", "size": 8, "type": "RETURN_ADDR", "label": "Return address (@0x78)"},
        {"offset": "0x80", "size": 8, "type": "SAVED_REG", "label": "Saved RBX (MOV @0x80)"},
        {"offset": "0x88", "size": 8, "type": "SAVED_REG", "label": "Saved RSI (MOV @0x88)"},
        {"offset": "0x90", "size": 8, "type": "SAVED_REG", "label": "Saved RDI (MOV @0x90)"},
        {"offset": "0x98", "size": 8, "type": "SAVED_REG", "label": "Saved R12 (MOV @0x98)"}
      ],
      "saved_registers": [
        {"reg": "R12", "offset": "0x98", "method": "MOV"},
        {"reg": "RDI", "offset": "0x90", "method": "MOV"},
        {"reg": "RSI", "offset": "0x88", "method": "MOV"},
        {"reg": "RBX", "offset": "0x80", "method": "MOV"},
        {"reg": "R15", "offset": "0x60", "method": "PUSH"},
        {"reg": "R14", "offset": "0x68", "method": "PUSH"},
        {"reg": "RBP", "offset": "0x70", "method": "PUSH"}
      ],
      "unwind": {
        "version": 1,
        "flags": 0,
        "code_count": 12,
        "has_exception_handler": false,
        "has_termination_handler": false,
        "chain_depth": 0
      }
    }
  }
}

Technical Details

How It Works

  1. Module loading: Each DLL is loaded via LoadLibraryExW with DONT_RESOLVE_DLL_REFERENCES (no DllMain execution, no import resolution)
  2. Export parsing: The PE export table is walked to resolve function names. Exports are sorted by RVA for nearest-name binary search.
  3. .pdata parsing: The exception directory (DataDirectory[3]) contains the RUNTIME_FUNCTION array. Each entry maps a function's RVA range to its UNWIND_INFO.
  4. Frame reconstruction: Each UNWIND_INFO's UNWIND_CODE array is replayed to determine the exact stack frame layout. All 10 unwind opcodes are supported.
  5. CHAININFO: Chained unwind info is followed recursively (up to 32 levels deep).
  6. Display: Results are indexed in memory and presented via an ANSI/VT100 TUI or exported as JSON.

Supported UNWIND_CODE Operations

OpCode Operation Frame Effect
0 UWOP_PUSH_NONVOL push reg: RSP -= 8
1 UWOP_ALLOC_LARGE sub rsp, N: large allocation
2 UWOP_ALLOC_SMALL sub rsp, (info*8)+8: small allocation
3 UWOP_SET_FPREG Set frame pointer register
4 UWOP_SAVE_NONVOL mov [rsp+off], reg: save at explicit offset
5 UWOP_SAVE_NONVOL_FAR Same, 32-bit offset
8 UWOP_SAVE_XMM128 Save XMM register (16 bytes)
9 UWOP_SAVE_XMM128_FAR Same, 32-bit offset
10 UWOP_PUSH_MACHFRAME CPU trap frame (interrupts)

Architecture

kagura-stackwalker/
├── include/
│   └── kagura.h            # Shared types and constants
├── src/
│   ├── main.c              # Entry point, CLI parsing
│   ├── kagura_data.c       # Global constant arrays
│   ├── pe_parser.c/.h      # PE loading, export parsing, .pdata extraction
│   ├── frame_recon.c/.h    # UNWIND_INFO decoding, frame reconstruction
│   ├── module_db.c/.h      # Module database, search indexing
│   ├── tui.c/.h            # ANSI/VT100 rendering engine
│   ├── tui_views.c/.h      # Module list, function list, frame detail views
│   ├── output.c/.h         # JSON and console export
│   └── utils.c/.h          # String utilities
├── external/
│   └── zydis/              # Zydis disassembler (git submodule)
├── CMakeLists.txt
└── CMakePresets.json

Roadmap (v2)

  • Gadget finder: score functions for stack spoofing compatibility
  • Chain builder: automatically assemble multi-frame fake stacks
  • Stack forge generator: output ready-to-use C structs / ASM
  • Walk simulator: validate forged stacks against EDR-style checks
  • Prologue disassembly via Zydis in frame detail view
  • Export to C headers and MASM/NASM
  • Custom module loading via CLI argument

Dependencies

  • Zydis: Fast x86/x64 disassembler library (MIT license, included as git submodule)

License

MIT

Credits

ENENRA

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors