Skip to content

test: Add integration tests for all v0.4 features (GraphMemory, contradiction_score, scenarios) #31

@Neal006

Description

@Neal006

Background

Every new feature in v0.4 needs integration tests to prevent regressions and serve as living documentation. Tests live in tests/test_pipeline.py and follow well-established patterns. This issue covers 7 new tests across three areas: GraphMemory, contradiction_score, and the two new scenarios.

Note: These tests can only be written once the corresponding feature PRs are merged (or in the same PR as the feature). Coordinate with the authors of Issues #20, #21, #23, and #24.

Key files:

  • tests/test_pipeline.py — all existing tests (24 total) and the _populate() helper
  • simulator/facts.pyBENCHMARK_FACTS used in most tests
  • evaluation/metrics.py — metric function signatures

Tests to add

Append all 7 tests to tests/test_pipeline.py. Follow the existing _populate() helper pattern:

def _populate(memory, facts, turns: int):
    """Helper: generate conversation and feed messages to memory."""
    events = generate_conversation(facts, turns)
    for ev in events:
        ack = "Got it." if ev["is_fact"] else "Sure."
        memory.add_message("user", ev["content"], ev["turn"])
        memory.add_message("assistant", ack, ev["turn"])
    return events

GraphMemory tests (3)

def test_graph_memory_recall_early():
    """GraphMemory should recall >= 75% of facts at T=15."""
    from memory.graph import GraphMemory
    mem = GraphMemory()
    _populate(mem, BENCHMARK_FACTS, 15)
    hits = sum(
        1 for f in BENCHMARK_FACTS
        if recall_at_t(mem, f, 15)["recalled"]
    )
    assert hits / len(BENCHMARK_FACTS) >= 0.75

def test_graph_memory_no_stale_after_update():
    """After a fact update at T=40, the old graph node should have valid=False."""
    from memory.graph import GraphMemory
    mem = GraphMemory()
    _populate(mem, BENCHMARK_FACTS, 50)
    # city updated at T=40: Bangalore -> Mumbai
    stale_nodes = [
        n for n, d in mem.graph.nodes(data=True)
        if d.get("subject") == "city" and d.get("value") == "Bangalore"
    ]
    assert all(not d.get("valid", True) for n, d in
               ((n, mem.graph.nodes[n]) for n in stale_nodes))

def test_graph_benchmark_registration():
    """_make_memory('graph') should not raise ValueError."""
    from evaluation.benchmark import _make_memory
    mem = _make_memory("graph")
    assert mem is not None
    assert mem.name == "graph"

contradiction_score tests (2)

def test_contradiction_score_zero_before_update():
    """At T=10 (before any update), contradiction_score should return 0.0."""
    from evaluation.metrics import contradiction_score
    from memory.naive import NaiveMemory
    mem = NaiveMemory()
    _populate(mem, BENCHMARK_FACTS, 10)
    score = contradiction_score(mem, BENCHMARK_FACTS, 10)
    assert score == 0.0

def test_contradiction_score_after_update():
    """At T=50 (after city update at T=40), score should be float in [0, 1]."""
    from evaluation.metrics import contradiction_score
    from memory.naive import NaiveMemory
    mem = NaiveMemory()
    _populate(mem, BENCHMARK_FACTS, 55)
    score = contradiction_score(mem, BENCHMARK_FACTS, 55)
    assert 0.0 <= score <= 1.0

Scenario structure tests (2)

def test_customer_support_scenario_structure():
    """CUSTOMER_SUPPORT_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
    from simulator.scenarios.customer_support import (
        CUSTOMER_SUPPORT_FACTS,
        CUSTOMER_SUPPORT_PERSONA_POOL,
        CUSTOMER_SUPPORT_FILLER_TURNS,
    )
    assert len(CUSTOMER_SUPPORT_FACTS) == 8
    assert len(CUSTOMER_SUPPORT_PERSONA_POOL) == 5
    assert len(CUSTOMER_SUPPORT_FILLER_TURNS) >= 20

def test_medical_scenario_structure():
    """MEDICAL_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
    from simulator.scenarios.medical import (
        MEDICAL_FACTS, MEDICAL_PERSONA_POOL, MEDICAL_FILLER_TURNS,
    )
    assert len(MEDICAL_FACTS) == 8
    assert len(MEDICAL_PERSONA_POOL) == 5
    assert len(MEDICAL_FILLER_TURNS) >= 20

Acceptance criteria

  • All 7 tests present in tests/test_pipeline.py
  • All 7 new tests pass
  • All 24 existing tests still pass (total: 31 tests)
  • No test requires an API key (content-only metrics only)
  • Tests use the _populate() helper — no inline conversation generation

Getting started

pytest tests/ -v   # baseline: 24 tests should pass

# Run only the new tests once you've added them:
pytest tests/test_pipeline.py -v -k "graph or contradiction or customer or medical"

# Full suite:
pytest tests/ -v   # should show 31 tests passing

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: memory-backendMemory backend implementations (memory/)area: metricsEvaluation metrics (evaluation/metrics.py)difficulty: intermediateRequires familiarity with the codebasehacktoberfestEligible for Hacktoberfest contributionshelp wantedExtra attention needed - community welcomepriority: p2-highImportant, target next milestonestatus: open-for-contributionNot assigned - free to claimtestAdding or updating tests

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions