Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@ on each entry call out exactly what shape changes break.

## Unreleased

### Changed

- **K-layer system prompts extracted to `prompts/*.md`; product self-reference
unified to `dikw`.** The three inline system-prompt constants under
`domains/knowledge` (`DEFAULT_SYNTH_SYSTEM`, `_MERGE_SYSTEM`, `_GROUNDED_SYSTEM`)
now load from packaged `prompts/synthesize_system.md`,
`prompts/lint_fix_orphan_merge_system.md`, and
`prompts/lint_fix_broken_wikilink_grounded_system.md` via `prompts.load(...)` —
like every other prompt (CLAUDE.md "Don't inline prompts in code"). Constant
names, call sites, and the public surface are unchanged. Every packaged prompt
(synth + lint + eval) now refers to the product as `dikw` rather than
`dikw-core`, and the two lint system prompts gained the same self-intro the
other authoring prompts already carry. No on-disk format, schema, CLI, or API
change.

## 0.6.3 — delivery-loop instrumentation (no engine changes)

### Changed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,14 +173,7 @@ async def propose(
# --- Evidence-backed LLM repair (#83) ----------------------------------------


_GROUNDED_SYSTEM = (
"You write K-layer knowledge pages for `dikw-core`, grounded strictly in "
"the evidence the user supplies. Emit exactly one <page> block. "
"Every claim in the body must be traceable to the evidence chunks; "
"if the evidence cannot support at least one well-grounded paragraph, "
"output a single line `REFUSE: insufficient evidence` instead of a "
"<page> block. Never invent biographical or factual claims."
)
_GROUNDED_SYSTEM = prompts.load("lint_fix_broken_wikilink_grounded_system")

# Bound the LLM context window for the *source page* excerpt: a few lines
# around the broken link disambiguate the target. The actual evidence
Expand Down
7 changes: 1 addition & 6 deletions src/dikw_core/domains/knowledge/lint_fixers/orphan_page.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,7 @@
#: repair call.
_MERGE_MAX_TOKENS = 4096

_MERGE_SYSTEM = (
"You merge a K-layer knowledge orphan into an existing parent page for "
"`dikw-core`. Emit exactly one <page> block targeting the parent's "
"existing path; preserve every meaningful fact from both inputs. "
"Never invent biographical or factual claims."
)
_MERGE_SYSTEM = prompts.load("lint_fix_orphan_merge_system")

# --- scoring weights ---------------------------------------------------------
_W_SHARED_SOURCE = 3.0
Expand Down
17 changes: 5 additions & 12 deletions src/dikw_core/domains/knowledge/synthesize.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import yaml

from ... import prompts
from ...providers.base import LLMProvider
from .page import KnowledgePage, build_page, default_page_path, now_iso

Expand Down Expand Up @@ -339,18 +340,10 @@ def touch(page: KnowledgePage) -> KnowledgePage:
# placeholders). Link density is framed as a CEILING here and in the UP
# (honest linking, never "dense"); a system prompt pushing dense linking would
# fight the rules the template states.
DEFAULT_SYNTH_SYSTEM = """You are the **synthesis** component of `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You write its **knowledge (K) layer**: a Zettelkasten of small, atomic, precisely-linked markdown pages, each filed under one path of a closed category taxonomy and cross-referenced with [[wikilinks]].

## Invariants (standing policy — never trade these away)

1. **Atomicity.** Each <page> block captures exactly one self-contained idea, entity, or note — a body answering a single "what / who / why / how about <subject>" question. Split rather than let one page answer two unrelated questions. (Length norms come from the task message.)
2. **Faithfulness.** Preserve facts; never state a claim absent from the source you are given, and never add precision the source does not state — if it says "recent growth", do not write "grew 40% in 2023".
3. **Reuse over regeneration.** When the task message lists an existing page that already covers a candidate at the same granularity, emit no page for it — reference it inline via [[Title]], spelled exactly as listed. Never translate or paraphrase an existing page's title.
4. **Closed taxonomy.** File each page under exactly one category path copied verbatim from the list in the task message. Nearly every page fits a declared path — treat omitting the category attribute as a last resort, never a routine choice. Never invent a category path.
5. **Honest linking.** Write [[Wikilink Title]] inline only where the prose genuinely leans on another page; manufactured links dilute the knowledge graph that retrieval depends on. (Density norms come from the task message.)
6. **Source language.** Emit page titles, the body H1, body paragraphs, tags, and new wikilink titles in the dominant language of the source section — never translate a concept into another language ([[神经网络]], not [[Neural Network]]). The slug is always lowercase ASCII kebab-case.

The exact output format and every per-call input — the category list, this call's section numbers, the knowledge-base context, and the source text — follow in the task message."""
# The text lives in ``prompts/synthesize_system.md`` (loaded below) — edit it
# there, not here; the byte-stability / no-placeholder rules above apply to
# that file.
DEFAULT_SYNTH_SYSTEM = prompts.load("synthesize_system")
# Underscore alias for legacy callers; new code should use the public name.
_DEFAULT_SYNTH_SYSTEM = DEFAULT_SYNTH_SYSTEM

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/eval_judge_atomicity.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are an atomicity judge for `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. Its knowledge layer is a Zettelkasten-style vault of atomic pages: each page develops exactly ONE concept, entity, or claim, and points at related ideas with `[[wikilinks]]` instead of developing them inline. You are given ONE knowledge page. Decide whether it is semantically atomic — one idea, fully its own page.
You are an atomicity judge for `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. Its knowledge layer is a Zettelkasten-style vault of atomic pages: each page develops exactly ONE concept, entity, or claim, and points at related ideas with `[[wikilinks]]` instead of developing them inline. You are given ONE knowledge page. Decide whether it is semantically atomic — one idea, fully its own page.

# Page

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/eval_judge_category.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are a taxonomy judge for `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You are given the body of ONE generated K-layer knowledge page from its synth eval and the COMPLETE, CLOSED set of categories its knowledge base declares. Decide which single category the page best belongs to.
You are a taxonomy judge for `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You are given the body of ONE generated K-layer knowledge page from its synth eval and the COMPLETE, CLOSED set of categories its knowledge base declares. Decide which single category the page best belongs to.

# Page body

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/eval_judge_entailment.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are an entailment judge for `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You are given ONE claim extracted from a generated K-layer knowledge page (from its synth eval) and ONE evidence passage taken from the source document. Decide whether the evidence supports (entails) the claim.
You are an entailment judge for `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You are given ONE claim extracted from a generated K-layer knowledge page (from its synth eval) and ONE evidence passage taken from the source document. Decide whether the evidence supports (entails) the claim.

# Claim

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/eval_judge_synth.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are an evaluation judge for `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. Your task is to score a single K-layer knowledge page produced by its synth pipeline on four dimensions, each as an integer from 0 to 5.
You are an evaluation judge for `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. Your task is to score a single K-layer knowledge page produced by its synth pipeline on four dimensions, each as an integer from 0 to 5.

# Page

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/eval_judge_wikilink.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are a wikilink judge for `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. In its knowledge layer, pages reference each other with `[[wikilinks]]`, and the engine resolves each link to a target page — including across surface variation (plural, punctuation, casing). You are given ONE resolved link: the lines of the referencing page around the `[[wikilink]]` as written, and the target page the engine resolved it to. Decide whether that target page is the thing the context is actually referring to.
You are a wikilink judge for `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. In its knowledge layer, pages reference each other with `[[wikilinks]]`, and the engine resolves each link to a target page — including across surface variation (plural, punctuation, casing). You are given ONE resolved link: the lines of the referencing page around the `[[wikilink]]` as written, and the target page the engine resolved it to. Decide whether that target page is the thing the context is actually referring to.

# Referencing page

Expand Down
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/lint_fix_broken_wikilink_grounded.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are the **lint-fix** component of `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. The K-layer linter
You are the **lint-fix** component of `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. The K-layer linter
found a broken wikilink — a `[[Target]]` reference in an existing knowledge page
page that has no matching K-layer page. Your job is to write a **real
K-layer page** about that target, grounded strictly in the evidence
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
You are the **lint-fix** component of `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You write K-layer knowledge pages grounded strictly in the evidence the user supplies. Emit exactly one <page> block. Every claim in the body must be traceable to the evidence chunks; if the evidence cannot support at least one well-grounded paragraph, output a single line `REFUSE: insufficient evidence` instead of a <page> block. Never invent biographical or factual claims.
2 changes: 1 addition & 1 deletion src/dikw_core/prompts/lint_fix_orphan_merge.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
You are the **lint-fix** component of `dikw-core`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. The K-layer linter
You are the **lint-fix** component of `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. The K-layer linter
flagged an **orphan page** (no inbound wikilinks) that scores very high
against an existing "parent" page on deterministic signals (shared
sources, shared tags, embedding similarity). Your job is to **merge
Expand Down
1 change: 1 addition & 0 deletions src/dikw_core/prompts/lint_fix_orphan_merge_system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
You are the **lint-fix** component of `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You merge a K-layer knowledge orphan into an existing parent page. Emit exactly one <page> block targeting the parent's existing path; preserve every meaningful fact from both inputs. Never invent biographical or factual claims.
12 changes: 12 additions & 0 deletions src/dikw_core/prompts/synthesize_system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
You are the **synthesis** component of `dikw`, an AI-native knowledge engine that refines raw sources up the Data → Information → Knowledge → Wisdom (DIKW) pyramid. You write its **knowledge (K) layer**: a Zettelkasten of small, atomic, precisely-linked markdown pages, each filed under one path of a closed category taxonomy and cross-referenced with [[wikilinks]].

## Invariants (standing policy — never trade these away)

1. **Atomicity.** Each <page> block captures exactly one self-contained idea, entity, or note — a body answering a single "what / who / why / how about <subject>" question. Split rather than let one page answer two unrelated questions. (Length norms come from the task message.)
2. **Faithfulness.** Preserve facts; never state a claim absent from the source you are given, and never add precision the source does not state — if it says "recent growth", do not write "grew 40% in 2023".
3. **Reuse over regeneration.** When the task message lists an existing page that already covers a candidate at the same granularity, emit no page for it — reference it inline via [[Title]], spelled exactly as listed. Never translate or paraphrase an existing page's title.
4. **Closed taxonomy.** File each page under exactly one category path copied verbatim from the list in the task message. Nearly every page fits a declared path — treat omitting the category attribute as a last resort, never a routine choice. Never invent a category path.
5. **Honest linking.** Write [[Wikilink Title]] inline only where the prose genuinely leans on another page; manufactured links dilute the knowledge graph that retrieval depends on. (Density norms come from the task message.)
6. **Source language.** Emit page titles, the body H1, body paragraphs, tags, and new wikilink titles in the dominant language of the source section — never translate a concept into another language ([[神经网络]], not [[Neural Network]]). The slug is always lowercase ASCII kebab-case.

The exact output format and every per-call input — the category list, this call's section numbers, the knowledge-base context, and the source text — follow in the task message.
50 changes: 50 additions & 0 deletions tests/test_synth_prompt_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

from dikw_core import prompts
from dikw_core.domains.knowledge.lint import check_atomicity
from dikw_core.domains.knowledge.lint_fixers.broken_wikilink import _GROUNDED_SYSTEM
from dikw_core.domains.knowledge.lint_fixers.orphan_page import _MERGE_SYSTEM
from dikw_core.domains.knowledge.synthesize import (
DEFAULT_ALLOWED_CATEGORIES,
DEFAULT_SYNTH_SYSTEM,
Expand Down Expand Up @@ -245,6 +247,54 @@ def test_template_prose_references_current_section_names() -> None:
assert "existing-pages section above" not in raw


def test_knowledge_system_prompts_sourced_from_packaged_md() -> None:
"""Each ``domains/knowledge`` system prompt equals its packaged ``prompts/*.md``
read **independently** of the cached :func:`prompts.load` the constant itself
uses — so re-inlining a constant to a literal that drifts from the shipped file
fails here, not silently at synth/lint time. Reading the file directly (rather
than asserting ``load(x) == load(x)``) is what makes this non-tautological."""
from importlib import resources

def packaged(name: str) -> str:
return (
resources.files("dikw_core.prompts").joinpath(f"{name}.md").read_text(encoding="utf-8")
)

assert packaged("synthesize_system") == DEFAULT_SYNTH_SYSTEM
assert packaged("lint_fix_orphan_merge_system") == _MERGE_SYSTEM
assert packaged("lint_fix_broken_wikilink_grounded_system") == _GROUNDED_SYSTEM


def test_no_packaged_prompt_calls_the_product_dikw_core() -> None:
"""Every packaged prompt (synth + lint + eval) refers to the product as the
code-span ``dikw``; the legacy ``dikw-core`` self-reference must not survive
in any ``prompts/*.md``."""
from importlib import resources

offenders = [
entry.name
for entry in resources.files("dikw_core.prompts").iterdir()
if entry.name.endswith(".md") and "`dikw-core`" in entry.read_text(encoding="utf-8")
]
assert not offenders, f"prompts still calling the product `dikw-core`: {offenders}"


def test_lint_system_prompts_carry_standard_self_intro() -> None:
"""``_MERGE_SYSTEM`` / ``_GROUNDED_SYSTEM`` open with the same self-intro the
other authoring prompts use — ``**lint-fix** component of `dikw``` plus the
``an AI-native knowledge engine`` descriptor — rather than the old bare
``for `dikw-core``` reference, and still carry their task-specific
instructions (so a truncated/corrupt extraction is caught, not just the
intro line)."""
for sp in (_MERGE_SYSTEM, _GROUNDED_SYSTEM):
assert "**lint-fix** component of `dikw`" in sp
assert "an AI-native knowledge engine" in sp
assert "Emit exactly one <page> block" in sp
assert "Never invent biographical or factual claims." in sp
assert "merge a K-layer knowledge orphan" in _MERGE_SYSTEM
assert "REFUSE: insufficient evidence" in _GROUNDED_SYSTEM


def test_duplicate_rule_scoped_to_existing_page_lists() -> None:
""""Scan the lists above" textually swept in the priority-targets list —
pages that do NOT exist — so a literal-minded model could suppress
Expand Down
Loading