Background
Every new feature in v0.4 needs integration tests to prevent regressions and serve as living documentation. Tests live in tests/test_pipeline.py and follow well-established patterns. This issue covers 7 new tests across three areas: GraphMemory, contradiction_score, and the two new scenarios.
Note: These tests can only be written once the corresponding feature PRs are merged (or in the same PR as the feature). Coordinate with the authors of Issues #20, #21, #23, and #24.
Key files:
tests/test_pipeline.py — all existing tests (24 total) and the _populate() helper
simulator/facts.py — BENCHMARK_FACTS used in most tests
evaluation/metrics.py — metric function signatures
Tests to add
Append all 7 tests to tests/test_pipeline.py. Follow the existing _populate() helper pattern:
def _populate(memory, facts, turns: int):
"""Helper: generate conversation and feed messages to memory."""
events = generate_conversation(facts, turns)
for ev in events:
ack = "Got it." if ev["is_fact"] else "Sure."
memory.add_message("user", ev["content"], ev["turn"])
memory.add_message("assistant", ack, ev["turn"])
return events
GraphMemory tests (3)
def test_graph_memory_recall_early():
"""GraphMemory should recall >= 75% of facts at T=15."""
from memory.graph import GraphMemory
mem = GraphMemory()
_populate(mem, BENCHMARK_FACTS, 15)
hits = sum(
1 for f in BENCHMARK_FACTS
if recall_at_t(mem, f, 15)["recalled"]
)
assert hits / len(BENCHMARK_FACTS) >= 0.75
def test_graph_memory_no_stale_after_update():
"""After a fact update at T=40, the old graph node should have valid=False."""
from memory.graph import GraphMemory
mem = GraphMemory()
_populate(mem, BENCHMARK_FACTS, 50)
# city updated at T=40: Bangalore -> Mumbai
stale_nodes = [
n for n, d in mem.graph.nodes(data=True)
if d.get("subject") == "city" and d.get("value") == "Bangalore"
]
assert all(not d.get("valid", True) for n, d in
((n, mem.graph.nodes[n]) for n in stale_nodes))
def test_graph_benchmark_registration():
"""_make_memory('graph') should not raise ValueError."""
from evaluation.benchmark import _make_memory
mem = _make_memory("graph")
assert mem is not None
assert mem.name == "graph"
contradiction_score tests (2)
def test_contradiction_score_zero_before_update():
"""At T=10 (before any update), contradiction_score should return 0.0."""
from evaluation.metrics import contradiction_score
from memory.naive import NaiveMemory
mem = NaiveMemory()
_populate(mem, BENCHMARK_FACTS, 10)
score = contradiction_score(mem, BENCHMARK_FACTS, 10)
assert score == 0.0
def test_contradiction_score_after_update():
"""At T=50 (after city update at T=40), score should be float in [0, 1]."""
from evaluation.metrics import contradiction_score
from memory.naive import NaiveMemory
mem = NaiveMemory()
_populate(mem, BENCHMARK_FACTS, 55)
score = contradiction_score(mem, BENCHMARK_FACTS, 55)
assert 0.0 <= score <= 1.0
Scenario structure tests (2)
def test_customer_support_scenario_structure():
"""CUSTOMER_SUPPORT_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
from simulator.scenarios.customer_support import (
CUSTOMER_SUPPORT_FACTS,
CUSTOMER_SUPPORT_PERSONA_POOL,
CUSTOMER_SUPPORT_FILLER_TURNS,
)
assert len(CUSTOMER_SUPPORT_FACTS) == 8
assert len(CUSTOMER_SUPPORT_PERSONA_POOL) == 5
assert len(CUSTOMER_SUPPORT_FILLER_TURNS) >= 20
def test_medical_scenario_structure():
"""MEDICAL_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
from simulator.scenarios.medical import (
MEDICAL_FACTS, MEDICAL_PERSONA_POOL, MEDICAL_FILLER_TURNS,
)
assert len(MEDICAL_FACTS) == 8
assert len(MEDICAL_PERSONA_POOL) == 5
assert len(MEDICAL_FILLER_TURNS) >= 20
Acceptance criteria
Getting started
pytest tests/ -v # baseline: 24 tests should pass
# Run only the new tests once you've added them:
pytest tests/test_pipeline.py -v -k "graph or contradiction or customer or medical"
# Full suite:
pytest tests/ -v # should show 31 tests passing
Background
Every new feature in v0.4 needs integration tests to prevent regressions and serve as living documentation. Tests live in
tests/test_pipeline.pyand follow well-established patterns. This issue covers 7 new tests across three areas: GraphMemory, contradiction_score, and the two new scenarios.Note: These tests can only be written once the corresponding feature PRs are merged (or in the same PR as the feature). Coordinate with the authors of Issues #20, #21, #23, and #24.
Key files:
tests/test_pipeline.py— all existing tests (24 total) and the_populate()helpersimulator/facts.py—BENCHMARK_FACTSused in most testsevaluation/metrics.py— metric function signaturesTests to add
Append all 7 tests to
tests/test_pipeline.py. Follow the existing_populate()helper pattern:GraphMemory tests (3)
contradiction_score tests (2)
Scenario structure tests (2)
Acceptance criteria
tests/test_pipeline.py_populate()helper — no inline conversation generationGetting started