test: Add integration tests for all v0.4 features (GraphMemory, contradiction_score, scenarios)

## Background

Every new feature in v0.4 needs integration tests to prevent regressions and serve as living documentation. Tests live in `tests/test_pipeline.py` and follow well-established patterns. This issue covers 7 new tests across three areas: GraphMemory, contradiction_score, and the two new scenarios.

**Note:** These tests can only be written once the corresponding feature PRs are merged (or in the same PR as the feature). Coordinate with the authors of Issues #20, #21, #23, and #24.

Key files:
- `tests/test_pipeline.py` — all existing tests (24 total) and the `_populate()` helper
- `simulator/facts.py` — `BENCHMARK_FACTS` used in most tests
- `evaluation/metrics.py` — metric function signatures

## Tests to add

Append all 7 tests to `tests/test_pipeline.py`. Follow the existing `_populate()` helper pattern:

```python
def _populate(memory, facts, turns: int):
    """Helper: generate conversation and feed messages to memory."""
    events = generate_conversation(facts, turns)
    for ev in events:
        ack = "Got it." if ev["is_fact"] else "Sure."
        memory.add_message("user", ev["content"], ev["turn"])
        memory.add_message("assistant", ack, ev["turn"])
    return events
```

### GraphMemory tests (3)

```python
def test_graph_memory_recall_early():
    """GraphMemory should recall >= 75% of facts at T=15."""
    from memory.graph import GraphMemory
    mem = GraphMemory()
    _populate(mem, BENCHMARK_FACTS, 15)
    hits = sum(
        1 for f in BENCHMARK_FACTS
        if recall_at_t(mem, f, 15)["recalled"]
    )
    assert hits / len(BENCHMARK_FACTS) >= 0.75

def test_graph_memory_no_stale_after_update():
    """After a fact update at T=40, the old graph node should have valid=False."""
    from memory.graph import GraphMemory
    mem = GraphMemory()
    _populate(mem, BENCHMARK_FACTS, 50)
    # city updated at T=40: Bangalore -> Mumbai
    stale_nodes = [
        n for n, d in mem.graph.nodes(data=True)
        if d.get("subject") == "city" and d.get("value") == "Bangalore"
    ]
    assert all(not d.get("valid", True) for n, d in
               ((n, mem.graph.nodes[n]) for n in stale_nodes))

def test_graph_benchmark_registration():
    """_make_memory('graph') should not raise ValueError."""
    from evaluation.benchmark import _make_memory
    mem = _make_memory("graph")
    assert mem is not None
    assert mem.name == "graph"
```

### contradiction_score tests (2)

```python
def test_contradiction_score_zero_before_update():
    """At T=10 (before any update), contradiction_score should return 0.0."""
    from evaluation.metrics import contradiction_score
    from memory.naive import NaiveMemory
    mem = NaiveMemory()
    _populate(mem, BENCHMARK_FACTS, 10)
    score = contradiction_score(mem, BENCHMARK_FACTS, 10)
    assert score == 0.0

def test_contradiction_score_after_update():
    """At T=50 (after city update at T=40), score should be float in [0, 1]."""
    from evaluation.metrics import contradiction_score
    from memory.naive import NaiveMemory
    mem = NaiveMemory()
    _populate(mem, BENCHMARK_FACTS, 55)
    score = contradiction_score(mem, BENCHMARK_FACTS, 55)
    assert 0.0 <= score <= 1.0
```

### Scenario structure tests (2)

```python
def test_customer_support_scenario_structure():
    """CUSTOMER_SUPPORT_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
    from simulator.scenarios.customer_support import (
        CUSTOMER_SUPPORT_FACTS,
        CUSTOMER_SUPPORT_PERSONA_POOL,
        CUSTOMER_SUPPORT_FILLER_TURNS,
    )
    assert len(CUSTOMER_SUPPORT_FACTS) == 8
    assert len(CUSTOMER_SUPPORT_PERSONA_POOL) == 5
    assert len(CUSTOMER_SUPPORT_FILLER_TURNS) >= 20

def test_medical_scenario_structure():
    """MEDICAL_FACTS: 8 facts, 5 personas, >= 20 filler turns."""
    from simulator.scenarios.medical import (
        MEDICAL_FACTS, MEDICAL_PERSONA_POOL, MEDICAL_FILLER_TURNS,
    )
    assert len(MEDICAL_FACTS) == 8
    assert len(MEDICAL_PERSONA_POOL) == 5
    assert len(MEDICAL_FILLER_TURNS) >= 20
```

## Acceptance criteria

- [ ] All 7 tests present in `tests/test_pipeline.py`
- [ ] All 7 new tests pass
- [ ] All 24 existing tests still pass (total: 31 tests)
- [ ] No test requires an API key (content-only metrics only)
- [ ] Tests use the `_populate()` helper — no inline conversation generation

## Getting started

```bash
pytest tests/ -v   # baseline: 24 tests should pass

# Run only the new tests once you've added them:
pytest tests/test_pipeline.py -v -k "graph or contradiction or customer or medical"

# Full suite:
pytest tests/ -v   # should show 31 tests passing
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Add integration tests for all v0.4 features (GraphMemory, contradiction_score, scenarios) #31

Background

Tests to add

GraphMemory tests (3)

contradiction_score tests (2)

Scenario structure tests (2)

Acceptance criteria

Getting started

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

test: Add integration tests for all v0.4 features (GraphMemory, contradiction_score, scenarios) #31

Description

Background

Tests to add

GraphMemory tests (3)

contradiction_score tests (2)

Scenario structure tests (2)

Acceptance criteria

Getting started

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions