Context
Snapshot.lineage today is a single-parent ancestry chain — each
snapshot has a parent_id pointing to the snapshot it derived from
(via evolve()). It's a nice clean structure for "child snapshot is
the parent's curriculum step."
It breaks down once training pulls harder:
- Multiple children from one parent: a curriculum runner that
forks one parent into N variants ("patch sql_injection vs add ssrf
vs harden accounts") wants all three to have the same parent_id —
fine. But it also wants to remember they were generated as a
batch from the same parent state. Today's chain doesn't capture
that batching.
- Multiple parents per child: counterfactual training compares
rollouts across two related-but-distinct snapshots; a derived
snapshot might want to reference both as "I'm derived from these."
Single-parent chain can't represent it.
- Lineage as the training data store: the long-term frame from
the meeting notes is "snapshots are the training data; the lineage
graph IS the curriculum." A pool / DAG fits that frame; a chain
doesn't.
What this issue tracks
Promote Snapshot.lineage from a single-parent chain to a
multi-parent reference graph. Persistent (snapshots store
relationships); queryable (you can ask "what derived from X" and
"what is Y's full provenance").
Why it matters
The training-loop integration (#198) and the curriculum-driven demo
(#200) will both push on the chain shape. Promoting it before they
land is cheaper than refactoring after.
Where to start
src/openrange/core/snapshot.py — LineageNode dataclass and how
it's serialized in Snapshot.as_dict / from_mapping.
src/openrange/core/store.py — SnapshotStore. If we want
efficient "what derived from X" queries, the store needs to index
by parent (it doesn't today).
Design questions
Open a design doc PR; these are open:
- Multi-parent shape: explicit list of parents per snapshot, or
a sibling references mechanism (snapshot ↔ snapshot relations
outside ancestry)?
- Curriculum metadata: when a snapshot is generated as part of a
batch, do we record the batch identifier? A "curriculum cohort"
concept?
- Backwards compatibility: existing snapshots have single-parent
chains. The new model needs to accept those without a migration.
- Storage: today it's JSON on disk. Multi-parent + queries
could push for SQLite / similar.
Acceptance
- Design doc PR landed.
Snapshot.lineage supports multiple parents.
SnapshotStore exposes "children of X" and "ancestors of Y"
queries.
- Existing single-parent snapshots load correctly under the new
model.
Notes
Defer until the training-loop work pushes back on the chain shape.
This issue is filed for visibility, not as a near-term priority.
Context
Snapshot.lineagetoday is a single-parent ancestry chain — eachsnapshot has a
parent_idpointing to the snapshot it derived from(via
evolve()). It's a nice clean structure for "child snapshot isthe parent's curriculum step."
It breaks down once training pulls harder:
forks one parent into N variants ("patch sql_injection vs add ssrf
vs harden accounts") wants all three to have the same parent_id —
fine. But it also wants to remember they were generated as a
batch from the same parent state. Today's chain doesn't capture
that batching.
rollouts across two related-but-distinct snapshots; a derived
snapshot might want to reference both as "I'm derived from these."
Single-parent chain can't represent it.
the meeting notes is "snapshots are the training data; the lineage
graph IS the curriculum." A pool / DAG fits that frame; a chain
doesn't.
What this issue tracks
Promote
Snapshot.lineagefrom a single-parent chain to amulti-parent reference graph. Persistent (snapshots store
relationships); queryable (you can ask "what derived from X" and
"what is Y's full provenance").
Why it matters
The training-loop integration (#198) and the curriculum-driven demo
(#200) will both push on the chain shape. Promoting it before they
land is cheaper than refactoring after.
Where to start
src/openrange/core/snapshot.py—LineageNodedataclass and howit's serialized in
Snapshot.as_dict/from_mapping.src/openrange/core/store.py—SnapshotStore. If we wantefficient "what derived from X" queries, the store needs to index
by parent (it doesn't today).
Design questions
Open a design doc PR; these are open:
a sibling references mechanism (snapshot ↔ snapshot relations
outside ancestry)?
batch, do we record the batch identifier? A "curriculum cohort"
concept?
chains. The new model needs to accept those without a migration.
could push for SQLite / similar.
Acceptance
Snapshot.lineagesupports multiple parents.SnapshotStoreexposes "children of X" and "ancestors of Y"queries.
model.
Notes
Defer until the training-loop work pushes back on the chain shape.
This issue is filed for visibility, not as a near-term priority.