Data Model

propstore has canonical authored artifacts in knowledge/, plus a compiled SQLite sidecar used as the read/query surface. Git/YAML is the source of truth; the sidecar is a versioned materialization.

The main authored entities are sources, concepts, forms, claims, stances, authored justifications, and contexts. Conditions are fields on claims and relations, not a separate top-level entity.

Sources

Sources live in sources/<slug>.yaml and are first-class provenance records. Claims compile with a stable source_slug foreign-key-style reference to these rows.

id: tag:example.org,2026:halpin-2010
kind: academic_paper
origin:
  type: doi
  value: 10.1016/j.websem.2010.01.001
  retrieved: 2026-04-07
trust:
  prior_base_rate: 0.6
  quality:
    venue: peer_reviewed
artifact_code: sha256:...

Source-branch notes.md and metadata remain canonical git artifacts. They are not compiled into claim reasoning tables.

Concepts

A concept is a named quantity, category, or structural entity. One YAML file per concept in concepts/, filename matches canonical_name.

id: concept1
canonical_name: fundamental_frequency
status: accepted
definition: Rate of vocal fold vibration during phonation.
domain: speech
form: frequency
aliases:
  - name: F0
    source: common
  - name: pitch_frequency
    source: Titze_1994
relationships:
  - type: component_of
    target: concept5
parameterization_relationships:
  - formula: "f0 = 1 / T0"
    sympy: "Eq(concept1, 1 / concept2)"
    inputs: [concept2]
    exactness: exact
    source: definition
    bidirectional: true

Status lifecycle: proposed -> accepted -> deprecated (with replaced_by pointer). Concepts are never deleted.

Kind system: Each concept has a form referencing a form definition file. The form determines the concept's kind:

Kind	Examples	CEL behavior
`quantity`	frequency, pressure, duration	Numeric comparisons and arithmetic
`category`	voice_quality_type, language	Equality and `in` checks against value sets
`boolean`	is_voiced	Boolean logic
`structural`	voice_source	Cannot appear in CEL expressions
`timepoint`	valid_from, valid_until	Numeric comparisons (epoch seconds); automatic interval ordering constraints; not valid for parameterization/dimensional algebra

Forms

Form definitions live in forms/<name>.yaml and define dimensional type signatures:

name: frequency
unit_symbol: Hz
dimensionless: false
dimensions:
  T: -1
common_alternatives:
  - unit: kHz
    type: multiplicative
    multiplier: 0.001

The compiler uses form definitions for unit validation via dimensional analysis, checking that claim units are compatible with concept dimensions.

Unit conversions (common_alternatives)

The common_alternatives array defines how non-SI units convert to the form's SI base unit. Three conversion types:

Multiplicative: si_value = raw * multiplier. Example: kHz has multiplier 1000, so 5 kHz becomes 5000 Hz.
Affine: si_value = raw * multiplier + offset. Used for temperature scales — degC uses offset 273.15 to convert to Kelvin.
Logarithmic: si_value = reference * base^(raw / divisor). Used for decibel scales — dB SPL uses base 10, divisor 20, reference 0.00002 Pa.

During sidecar build, all claim values are normalized to SI via these conversions (with pint as fallback for standard unit prefixes). The sidecar stores both raw and SI-normalized values (value_si, lower_bound_si, upper_bound_si).

Domain-specific units (extra_units)

The extra_units field registers domain-specific units not recognized by pint. Each entry has a symbol and optionally dimensions. These units are added to the form's allowed unit set and registered into the dimensional analysis symbol table.

See units-and-forms.md for full details on SI normalization, form algebra, and dimensional analysis.

Claims

Claims are extracted from papers and stored in claims/<paper_name>.yaml. There are nine claim types.

parameter

A numeric value binding for a concept under stated conditions:

- id: claim1
  type: parameter
  concept: concept1
  value: 120.0
  uncertainty: 15.0
  uncertainty_type: sd
  unit: Hz
  conditions:
    - "speaker_sex == 'male'"
  provenance:
    paper: Titze_1994
    page: 42

- id: claim1
  type: parameter
  concept: ad_reading_speed
  value: 180.0
  unit: "words/min"
  conditions:
    - "task == 'audio_description'"
  provenance:
    paper: Li_2026_ADCanvas
    page: 8

equation

A mathematical relationship with variable bindings:

- id: claim10
  type: equation
  expression: "OQ = (T_o) / T_0"
  sympy: "Eq(OQ, T_o / T_0)"
  variables:
    - symbol: OQ
      concept: concept3
    - symbol: T_o
      concept: concept4
    - symbol: T_0
      concept: concept2
  provenance:
    paper: Henrich_2003
    page: 8

measurement

A perceptual or behavioral measurement:

- id: claim20
  type: measurement
  target_concept: concept1
  measure: jnd_relative
  value: 0.003
  unit: ratio
  listener_population: native_english
  provenance:
    paper: Moore_1973
    page: 15

observation

A qualitative claim that resists parameterization:

- id: claim30
  type: observation
  statement: "Breathiness increases with incomplete glottal closure"
  concepts: [concept7, concept8]
  provenance:
    paper: Klatt_1990
    page: 22

model

A parameterized equation system:

- id: claim40
  type: model
  name: "Klatt cascade formant synthesizer"
  equations:
    - "output = cascade(F1, F2, F3, F4, F5)"
  parameters:
    - name: F1
      concept: concept10
  provenance:
    paper: Klatt_1980
    page: 5

algorithm

A procedural computation as a Python function body:

- id: claim50
  type: algorithm
  concept: concept12
  stage: excitation
  body: |
    def glottal_pulse(t, T0, Tp, Tn):
        if t < Tp:
            return 0.5 * (1 - math.cos(math.pi * t / Tp))
        elif t < Tp + Tn:
            return math.cos(math.pi * (t - Tp) / (2 * Tn))
        else:
            return 0.0
  variables:
    - name: t
      concept: concept60
    - name: T0
      concept: concept2
  provenance:
    paper: Klatt_1980
    page: 12

mechanism

A causal or explanatory process linking concepts:

- id: claim60
  type: mechanism
  statement: "Undercutting defeat removes the connection between premise and conclusion without challenging the premise itself"
  concepts: [undercutting_attack, defeasible_reasoning]
  provenance:
    paper: Pollock_1987
    page: 485

comparison

A comparative claim between approaches, methods, or systems:

- id: claim61
  type: comparison
  statement: "Preferred semantics produces more extensions than grounded semantics on frameworks with even-length cycles"
  concepts: [preferred_extension, grounded_extension]
  provenance:
    paper: Dung_1995
    page: 331

limitation

A known boundary, failure case, or applicability constraint:

- id: claim62
  type: limitation
  statement: "Stable extensions are not guaranteed to exist for all argumentation frameworks"
  concepts: [stable_extension, argumentation_framework]
  provenance:
    paper: Dung_1995
    page: 328

Conditions

Claims and relationships can be scoped by conditions — CEL (Common Expression Language) expressions that define when they hold:

conditions:
  - "speaker_sex == 'male'"
  - "task == 'speech'"

The compiler type-checks conditions against the concept registry, and production runtime evaluation uses the same Z3-backed CEL semantics:

quantity concepts use numeric comparisons
boolean concepts use boolean logic
closed categories (extensible: false) use finite enum semantics, so undeclared literals are hard errors
open categories (extensible: true) use symbolic string semantics, so undeclared literals remain semantically valid and only warn at check time
unknown concept names are hard errors everywhere

Justifications

Justifications are inference rules that connect premise claims to conclusion claims. There are two distinct cases:

Authored justifications live in justifications/<source>.yaml and compile into the sidecar justification table.
Runtime-derived justifications such as reported:claim_id and supports:a->b are built from the active claim graph when argumentation code needs them. They are not persisted in the sidecar.

Data model

Each justification is a CanonicalJustification with these fields:

Field	Type	Default	Description
`justification_id`	`str`	—	Unique identifier (e.g., `reported:claim1` or `supports:claim2->claim3`)
`conclusion_claim_id`	`str`	—	The claim this justification concludes
`premise_claim_ids`	`tuple[str, ...]`	`()`	Claims that serve as premises
`rule_kind`	`str`	`"reported_claim"`	Type of inference rule
`rule_strength`	`str`	`"defeasible"`	Whether the rule is strict or defeasible
`provenance`	`ProvenanceRecord \| None`	`None`	Source attribution
`attributes`	`tuple[tuple[str, Any], ...]`	`()`	Additional metadata as sorted key-value pairs

rule_kind

Three values:

reported_claim — Every claim automatically gets a reported_claim justification. This represents the claim's direct assertion from its source paper, with no premises.
supports — A premise claim provides corroborating evidence for the conclusion claim.
explains — A premise claim provides a mechanistic explanation for the conclusion claim.

rule_strength

Two values, corresponding to ASPIC+ rule types (Modgil & Prakken 2018, Def 2):

strict — The inference is logically unattackable. Strict rules have no name and cannot be undercut.
defeasible — The inference is tentative and can be undercut. Defeasible rules are named by their justification_id, which enables targeted undercutting (Def 8c).

ASPIC+ mapping

Justifications translate directly to ASPIC+ rules via the bridge in aspic_bridge.py. reported_claim justifications become knowledge base premises (not rules). Justifications with premises become strict or defeasible rules depending on rule_strength. See structured-argumentation.md for the full translation pipeline (T1–T7).

Targeted undercutting

An undercuts stance can include a target_justification_id field to attack a specific defeasible rule rather than all rules concluding a given claim. When multiple defeasible rules support the same conclusion, omitting target_justification_id raises an ambiguity error. This implements Pollock's (1987) undercutting defeat: the attacker targets the inference rule itself, not the conclusion.

Authoring

Authored justifications are optional and look like:

justifications:
  - id: just1
    conclusion: claim_observation
    premises: [claim_parameter]
    rule_kind: reported_claim
    rule_strength: defeasible

The sidecar stores authored justifications exactly so source-authored inference structure remains queryable. Support and explanation edges still participate in argumentation, but their CanonicalJustification records are synthesized at runtime from the active graph.

Stances

Claims can express epistemic relations to other claims:

stances:
  - type: rebuts
    target: claim15
    strength: strong
    note: "Contradicting conclusion with larger sample size"
  - type: supersedes
    target: claim42

Six stance types (ASPIC+ taxonomy, active voice — the claim holding the stance acts on the target):

Type	Category	Weight	Meaning
`rebuts`	Attack	-1.0	Directly contradicts the target's conclusion
`undercuts`	Attack	-1.0	Attacks the inference method or reasoning
`undermines`	Attack	-0.5	Weakens a premise or evidence quality
`supports`	Support	+1.0	Provides corroborating evidence
`explains`	Support	+0.5	Provides a mechanistic explanation
`supersedes`	Preference	---	Replaces the target entirely (short-circuits resolution)

Based on ASPIC+ (Modgil & Prakken 2014) and Pollock's rebutting vs undercutting distinction (Prakken & Horty 2012).

Stances feed into the argumentation framework — attacks become defeat candidates filtered through preference ordering, supports contribute to claim strength.

Contexts

Contexts represent research traditions, theoretical frameworks, or experimental paradigms that scope groups of claims. One YAML file per context in contexts/:

id: ctx_abstract_argumentation
name: ctx_abstract_argumentation
description: Dung's abstract argumentation framework tradition — arguments as abstract
  entities with attack relations, multiple acceptability semantics
structure:
  assumptions:
    - "domain == 'argumentation'"
  parameters:
    tradition: abstract
  perspective: dung
lifting_rules:
  - id: lift_dung_to_argumentation
    source:
      id: ctx_abstract_argumentation
    target:
      id: ctx_argumentation
    mode: monotonic

Claims reference their context via context: {id: ...}. The compiler validates that all context references resolve to registered contexts. Contexts are structured logical terms with authored assumptions, parameters, and perspective metadata. Visibility inheritance is not a production concept; cross-context visibility is granted only by explicit lifting rules.

Schema

The data model is defined in LinkML at schema/concept_registry.linkml.yaml and schema/claim.linkml.yaml. JSON Schema is generated from these for validation. Run schema/generate.py to regenerate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Model

Sources

Concepts

Forms

Unit conversions (common_alternatives)

Domain-specific units (extra_units)

Claims

parameter

equation

measurement

observation

model

algorithm

mechanism

comparison

limitation

Conditions

Justifications

Data model

rule_kind

rule_strength

ASPIC+ mapping

Targeted undercutting

Authoring

Stances

Contexts

Schema

FilesExpand file tree

data-model.md

Latest commit

History

data-model.md

File metadata and controls

Data Model

Sources

Concepts

Forms

Unit conversions (common_alternatives)

Domain-specific units (extra_units)

Claims

parameter

equation

measurement

observation

model

algorithm

mechanism

comparison

limitation

Conditions

Justifications

Data model

rule_kind

rule_strength

ASPIC+ mapping

Targeted undercutting

Authoring

Stances

Contexts

Schema