Neal006 · Neal006 · May 22, 2026 · May 22, 2026
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -1,31 +1,43 @@
 ---
 name: Feature Request
-about: Propose a new feature, metric, or memory backend
+about: Propose a new feature, LLM memory backend, evaluation metric, or benchmark scenario for MemoryLens
 title: "[FEAT] "
 labels: enhancement
 assignees: ''
 ---
 
 ## What problem does this solve?
 
-<!-- Describe the gap. What can't you do with MemoryLens right now? -->
+<!-- Describe the gap in LLM memory evaluation. What can't MemoryLens measure right now?
+     Examples:
+     - "There's no way to benchmark memory across sessions (persistent memory)"
+     - "Chunked RAG doesn't model importance-weighted eviction"
+     - "The EdTech scenario is missing — I want to benchmark student/teacher memory decay"
+-->
 
 ## Proposed solution
 
-<!-- How would you implement this? New backend? New metric? Dashboard change? -->
+<!-- How would you implement this? New memory backend? New metric? New benchmark scenario? Dashboard change? -->
 
 ## Which layer does this touch?
 
-- [ ] `simulator/` — conversation generation or fact injection
-- [ ] `memory/` — new or improved memory backend
-- [ ] `evaluation/` — new metric or benchmark change
-- [ ] `dashboard.py` — visualisation
+- [ ] `simulator/` — conversation generation, fact injection, or new domain scenario
+- [ ] `memory/` — new or improved memory backend (LLM memory architecture)
+- [ ] `evaluation/` — new metric or multi-seed benchmark change
+- [ ] `utils/providers.py` — new LLM provider
+- [ ] `dashboard.py` — visualisation (Streamlit + Plotly)
 - [ ] `main.py` / CLI
-- [ ] Documentation
+- [ ] Documentation / research paper
+
+## Expected impact on recall or efficiency
+
+<!-- If this is a new memory backend, what recall@T behavior do you expect?
+     If this is a new metric, what does it capture that Recall@T doesn't?
+-->
 
 ## Alternatives considered
 
-<!-- Have you tried working around this? What else could solve the problem? -->
+<!-- Have you tried working around this? Which existing backends or metrics come closest? -->
 
 ## Are you willing to implement this?
 
@@ -35,4 +47,4 @@ assignees: ''
 
 ## Additional context
 
-<!-- Links, papers, related work, mockups, etc. -->
+<!-- Links to papers (MemGPT, A-MEM, RAGAS, etc.), related LLM memory work, mockups, etc. -->
diff --git a/.github/ISSUE_TEMPLATE/new_backend.md b/.github/ISSUE_TEMPLATE/new_backend.md
@@ -1,41 +1,59 @@
 ---
 name: New Memory Backend
-about: Propose or claim a new memory backend implementation
+about: Propose or claim a new LLM memory architecture implementation for the MemoryLens benchmark
 title: "[BACKEND] "
 labels: enhancement, new-backend
 assignees: ''
 ---
 
 ## Backend name
 
-<!-- e.g. SummaryMemory, EntityMemory, RedisMemory -->
+<!-- e.g. EntityMemory, GraphMemory, QdrantMemory, RedisMemory, SlidingWindowMemory -->
 
-## What strategy does it use?
+## Memory strategy
 
-<!-- Describe in 2-3 sentences how this backend stores and retrieves messages -->
+<!-- Describe in 2-3 sentences how this backend stores and retrieves messages.
+     Examples:
+     - "Extract named entities into a key-value store; retrieve by entity name matching"
+     - "Use Qdrant vector DB with HNSW index; evict chunks older than N days"
+     - "Knowledge graph: entities as nodes, relationships as edges; multi-hop retrieval"
+-->
 
-## Why is it interesting to benchmark?
+## Research hypothesis
 
-<!-- What hypothesis does this backend test? What trade-off does it make? -->
+<!-- What does this backend test? What trade-off does it make?
+     The MemoryLens benchmark exists to measure LLM memory decay — what specific decay 
+     behavior do you expect this backend to show vs Naive / RAG / Cascading?
+-->
+
+## Expected Recall@T curve
+
+<!-- e.g. "Should match Cascading at T=50 but outperform at T=100 due to structured storage" -->
 
 ## Implementation sketch
 
 ```python
 class YourMemory(BaseMemory):
-    name = "your_backend"
+    name = "your_backend"   # used in --backends flag
 
-    def add_message(self, role, content, turn): ...
-    def get_context(self, query, current_turn): ...
-    def reset(self): ...
+    def add_message(self, role: str, content: str, turn: int) -> None: ...
+    def get_context(self, query: str, current_turn: int) -> List[Dict]: ...
+    def reset(self) -> None: ...
 ```
 
 ## Dependencies required
 
-<!-- Any new pip packages? Keep them minimal. -->
+<!-- Any new pip packages? Keep them optional (add to pyproject.toml extras). -->
+
+## Related work
+
+<!-- Papers, existing implementations, or LLM memory systems this is based on:
+     e.g. MemGPT (Packer 2023), A-MEM (Xu 2024), LangChain EntityMemory
+-->
 
 ## Are you claiming this to implement?
 
 - [ ] Yes — I'll open a PR within 2 weeks
 - [ ] No — leaving it open for the community
 
-<!-- See CONTRIBUTING.md for the full backend implementation guide -->
+<!-- See docs/adding-a-new-backend.md for the full implementation guide -->
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,50 @@
+cff-version: 1.2.0
+message: "If you use MemoryLens in your research, please cite it as below."
+type: software
+title: "MemoryLens: A Temporal Decay Benchmark for LLM Memory Architectures"
+abstract: >
+  MemoryLens is an open-source evaluation framework for measuring LLM memory decay
+  — how AI memory systems forget personal facts across long conversations. It implements
+  five memory architectures (Naive, Ideal RAG, Chunked RAG, Cascading Temporal, SummaryMemory),
+  five evaluation metrics (Recall@T, Precision@K, Temporal Drift, Memory Noise Ratio,
+  Cascade Efficiency), Ebbinghaus-grounded temporal decay with ablation, multi-seed
+  statistical validation across five diverse personas, and a dual evaluation pipeline
+  (content-based + LLM answer+judge) supporting five provider backends.
+authors:
+  - family-names: Srivastava
+    given-names: Neal
+    alias: Neal006
+    orcid: ""
+repository-code: "https://github.com/Neal006/memorylens"
+url: "https://github.com/Neal006/memorylens"
+license: MIT
+version: 0.3.0
+date-released: "2026-05-22"
+keywords:
+  - LLM memory
+  - memory decay
+  - LLM evaluation
+  - RAG evaluation
+  - temporal decay
+  - Ebbinghaus forgetting curve
+  - conversational AI
+  - memory benchmark
+  - long-term memory
+  - retrieval-augmented generation
+  - cascading memory
+  - LLM benchmark
+references:
+  - type: article
+    title: "RAGAS: Automated Evaluation of Retrieval Augmented Generation"
+    authors:
+      - family-names: Es
+        given-names: Shahul
+    year: 2023
+    url: "https://arxiv.org/abs/2309.15217"
+  - type: article
+    title: "MemGPT: Towards LLMs as Operating Systems"
+    authors:
+      - family-names: Packer
+        given-names: Charles
+    year: 2023
+    url: "https://arxiv.org/abs/2310.08560"