Agent Intelligence: Reflexion & Episodic Memory

## Description
Implement a cross-session learning system where the agent generates structured reflections after failures and stores complete engagement episodes. On future sessions, relevant reflections and past experiences are retrieved and injected into the agent's context — so it never repeats the same mistake twice.

## Why this matters
Today, every session starts from zero. The agent has no memory of:
- "Last time I tried CVE-2021-41773 on Apache 2.4.49, it failed because mod_cgi was disabled — I should check mod_cgi first"
- "Nuclei template xyz always false-positives on Cloudflare targets — skip it"
- "SSH brute force with admin:admin worked on this target last week — try that first"
- "sqlmap with --tamper=space2comment bypassed the WAF on similar PHP apps"

A human pentester builds this institutional knowledge over years. The agent discards it after every session. Reflexion + episodic memory closes this gap.

## Two components

### 1. Reflexion (failure-triggered)
After a tool execution fails, the agent generates a structured reflection:
- **Context**: what was the situation
- **Action taken**: what the agent did
- **Root cause**: why it failed (not just "error occurred" but the actual reason)
- **Lesson**: what to do differently next time
- **Tags**: keywords for retrieval (e.g., ["apache", "CVE-2021-41773", "mod_cgi"])

Reflections are retrieved by tag overlap when the agent encounters similar situations and injected as "LESSONS FROM PAST EXPERIENCE" in the system prompt.

### 2. Episodic Memory (session-level)
After each engagement, store a complete episode:
- Target technology stack
- Strategy used (attack path type)
- Tools sequence
- Outcome (success/partial/failure)
- Key findings
- Duration (iterations)

Episodes are retrieved by technology stack similarity — "I've seen Apache 2.4 + MySQL + WordPress before, here's what worked."

## What already exists
- Neo4j graph database (can store episodes as nodes with technology relationships)
- `AgentState.execution_trace` captures every tool call and result
- `AgentState.target_info` accumulates technology/service data
- Basic failure detection (3 consecutive failures)
- Tavily web search for external knowledge

## What needs to be built
- [ ] `ReflectionEntry` Pydantic model (context, action, root_cause, lesson, tags)
- [ ] `ReflectionMemory` class with generate/store/retrieve methods
- [ ] Reflection generation prompt (triggered after tool failure)
- [ ] Tag-based retrieval with injection into think node system prompt
- [ ] `Episode` model for complete engagement records
- [ ] `EpisodicMemory` class with Neo4j storage (leverages existing graph infra)
- [ ] Technology-based similarity retrieval for past episodes
- [ ] Integration into think node: retrieve relevant reflections + episodes before each LLM call
- [ ] Storage backend: Neo4j for episodes, JSON or PostgreSQL for reflections

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Intelligence: Reflexion & Episodic Memory #51

Description

Why this matters

Two components

1. Reflexion (failure-triggered)

2. Episodic Memory (session-level)

What already exists

What needs to be built

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent Intelligence: Reflexion & Episodic Memory #51

Description

Description

Why this matters

Two components

1. Reflexion (failure-triggered)

2. Episodic Memory (session-level)

What already exists

What needs to be built

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions