Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 22 additions & 10 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,43 @@
---
name: Feature Request
about: Propose a new feature, metric, or memory backend
about: Propose a new feature, LLM memory backend, evaluation metric, or benchmark scenario for MemoryLens
title: "[FEAT] "
labels: enhancement
assignees: ''
---

## What problem does this solve?

<!-- Describe the gap. What can't you do with MemoryLens right now? -->
<!-- Describe the gap in LLM memory evaluation. What can't MemoryLens measure right now?
Examples:
- "There's no way to benchmark memory across sessions (persistent memory)"
- "Chunked RAG doesn't model importance-weighted eviction"
- "The EdTech scenario is missing — I want to benchmark student/teacher memory decay"
-->

## Proposed solution

<!-- How would you implement this? New backend? New metric? Dashboard change? -->
<!-- How would you implement this? New memory backend? New metric? New benchmark scenario? Dashboard change? -->

## Which layer does this touch?

- [ ] `simulator/` — conversation generation or fact injection
- [ ] `memory/` — new or improved memory backend
- [ ] `evaluation/` — new metric or benchmark change
- [ ] `dashboard.py` — visualisation
- [ ] `simulator/` — conversation generation, fact injection, or new domain scenario
- [ ] `memory/` — new or improved memory backend (LLM memory architecture)
- [ ] `evaluation/` — new metric or multi-seed benchmark change
- [ ] `utils/providers.py` — new LLM provider
- [ ] `dashboard.py` — visualisation (Streamlit + Plotly)
- [ ] `main.py` / CLI
- [ ] Documentation
- [ ] Documentation / research paper

## Expected impact on recall or efficiency

<!-- If this is a new memory backend, what recall@T behavior do you expect?
If this is a new metric, what does it capture that Recall@T doesn't?
-->

## Alternatives considered

<!-- Have you tried working around this? What else could solve the problem? -->
<!-- Have you tried working around this? Which existing backends or metrics come closest? -->

## Are you willing to implement this?

Expand All @@ -35,4 +47,4 @@ assignees: ''

## Additional context

<!-- Links, papers, related work, mockups, etc. -->
<!-- Links to papers (MemGPT, A-MEM, RAGAS, etc.), related LLM memory work, mockups, etc. -->
42 changes: 30 additions & 12 deletions .github/ISSUE_TEMPLATE/new_backend.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,59 @@
---
name: New Memory Backend
about: Propose or claim a new memory backend implementation
about: Propose or claim a new LLM memory architecture implementation for the MemoryLens benchmark
title: "[BACKEND] "
labels: enhancement, new-backend
assignees: ''
---

## Backend name

<!-- e.g. SummaryMemory, EntityMemory, RedisMemory -->
<!-- e.g. EntityMemory, GraphMemory, QdrantMemory, RedisMemory, SlidingWindowMemory -->

## What strategy does it use?
## Memory strategy

<!-- Describe in 2-3 sentences how this backend stores and retrieves messages -->
<!-- Describe in 2-3 sentences how this backend stores and retrieves messages.
Examples:
- "Extract named entities into a key-value store; retrieve by entity name matching"
- "Use Qdrant vector DB with HNSW index; evict chunks older than N days"
- "Knowledge graph: entities as nodes, relationships as edges; multi-hop retrieval"
-->

## Why is it interesting to benchmark?
## Research hypothesis

<!-- What hypothesis does this backend test? What trade-off does it make? -->
<!-- What does this backend test? What trade-off does it make?
The MemoryLens benchmark exists to measure LLM memory decay — what specific decay
behavior do you expect this backend to show vs Naive / RAG / Cascading?
-->

## Expected Recall@T curve

<!-- e.g. "Should match Cascading at T=50 but outperform at T=100 due to structured storage" -->

## Implementation sketch

```python
class YourMemory(BaseMemory):
name = "your_backend"
name = "your_backend" # used in --backends flag

def add_message(self, role, content, turn): ...
def get_context(self, query, current_turn): ...
def reset(self): ...
def add_message(self, role: str, content: str, turn: int) -> None: ...
def get_context(self, query: str, current_turn: int) -> List[Dict]: ...
def reset(self) -> None: ...
```

## Dependencies required

<!-- Any new pip packages? Keep them minimal. -->
<!-- Any new pip packages? Keep them optional (add to pyproject.toml extras). -->

## Related work

<!-- Papers, existing implementations, or LLM memory systems this is based on:
e.g. MemGPT (Packer 2023), A-MEM (Xu 2024), LangChain EntityMemory
-->

## Are you claiming this to implement?

- [ ] Yes — I'll open a PR within 2 weeks
- [ ] No — leaving it open for the community

<!-- See CONTRIBUTING.md for the full backend implementation guide -->
<!-- See docs/adding-a-new-backend.md for the full implementation guide -->
50 changes: 50 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
cff-version: 1.2.0
message: "If you use MemoryLens in your research, please cite it as below."
type: software
title: "MemoryLens: A Temporal Decay Benchmark for LLM Memory Architectures"
abstract: >
MemoryLens is an open-source evaluation framework for measuring LLM memory decay
— how AI memory systems forget personal facts across long conversations. It implements
five memory architectures (Naive, Ideal RAG, Chunked RAG, Cascading Temporal, SummaryMemory),
five evaluation metrics (Recall@T, Precision@K, Temporal Drift, Memory Noise Ratio,
Cascade Efficiency), Ebbinghaus-grounded temporal decay with ablation, multi-seed
statistical validation across five diverse personas, and a dual evaluation pipeline
(content-based + LLM answer+judge) supporting five provider backends.
authors:
- family-names: Srivastava
given-names: Neal
alias: Neal006
orcid: ""
repository-code: "https://github.com/Neal006/memorylens"
url: "https://github.com/Neal006/memorylens"
license: MIT
version: 0.3.0
date-released: "2026-05-22"
keywords:
- LLM memory
- memory decay
- LLM evaluation
- RAG evaluation
- temporal decay
- Ebbinghaus forgetting curve
- conversational AI
- memory benchmark
- long-term memory
- retrieval-augmented generation
- cascading memory
- LLM benchmark
references:
- type: article
title: "RAGAS: Automated Evaluation of Retrieval Augmented Generation"
authors:
- family-names: Es
given-names: Shahul
year: 2023
url: "https://arxiv.org/abs/2309.15217"
- type: article
title: "MemGPT: Towards LLMs as Operating Systems"
authors:
- family-names: Packer
given-names: Charles
year: 2023
url: "https://arxiv.org/abs/2310.08560"
Loading
Loading