feat(nkb): JSON-LD export for LLM ingestion#14
Conversation
|
This PR needs an issue link. Add |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a54bb451f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
This PR needs an issue link. Add |
|
This PR needs an issue link. Add |
ee183f2 to
fcb43f9
Compare
|
This PR needs an issue link. Add |
1 similar comment
|
This PR needs an issue link. Add |
`scripts/nkb-export.mjs --jsonld` walks `workflow-patterns/` and writes a `dist/jsonld/<slug>.jsonld` per pattern. Each document conforms to schema.org/TechArticle with stable `@id` (`urn:nkb:pattern:<slug>`), the pattern's H1 as `name`/`headline`, the first non-metadata paragraph as `description`, and frontmatter (`tags`, `submitter`, `submitted_at`) projected onto `keywords`, `author`, and `dateCreated` when present. `nkb export --check` returns non-zero when any pattern lacks a corresponding `.jsonld`, giving CI a one-shot drift detector between the human corpus and its machine-readable export. Designed to plug into the unified `scripts/nkb.mjs` dispatcher established in round-1 PR #3 (`feat(nkb): local full-text search CLI`); runs standalone until that dispatcher re-lands. Bats suite `tests/jsonld.bats` (6/6 green) asserts: (1) one jsonld per pattern; (2) each parses as JSON and declares `@context: https://schema.org` with `@type: TechArticle`; (3) every `workflow-patterns/*.md` has its matching slug; (4) freshly added patterns appear on the next run with frontmatter projected correctly; (5) `--check` exits non-zero on missing jsonld; (6) bare invocation prints usage.
76fd9b8 to
371dc59
Compare
|
This PR needs an issue link. Add |
1 similar comment
|
This PR needs an issue link. Add |
## Summary Adds `scripts/nkb-graph.mjs` — emits a Mermaid graph of pattern→dependency edges parsed from a `## Depends-on` section in each `workflow-patterns/*.md` doc. The `--check` flag exits non-zero when any pattern references a slug with no matching `.md` file in the scanned directory. This is the **structural** half of the citation-graph convention started in round-1 PR #5 (`feat(nkb): citation lint + Sources backfill on pattern docs`): PR #5 made every doc cite its external sources via `## Sources`; this PR makes every doc declare its internal cross-pattern dependencies via `## Depends-on`, and proves they all resolve. Both lints share the same fail-closed posture so CI can wire either as a required check. The schema is minimal — `## Depends-on` is followed by a bullet list of slugs, one per line, where each slug is the basename of a sibling `.md` in the same directory. A pattern with no dependencies either omits the section entirely or leaves it empty. The existing `voice-agent-elevenlabs- patterns.md` doc parses cleanly under this schema with zero edges, so the new convention is strictly additive — no backfill required on `main`. ### Fixtures `fixtures/graph/{ok,broken}/` follows the same convention as the dedupe fixture set (#15) and the stats fixture (#17): - `ok/` — three-pattern chain (`webhook-dedup-key → http-retry-idempotency → error-monitoring-fanout`). All edges resolve. - `broken/` — same chain plus a dangling edge to `nonexistent-upstream`. This is the negative-path fixture for `--check`. ### Proof Mermaid render of the ok fixture: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/ok \`\`\`mermaid graph LR error_monitoring_fanout["error-monitoring-fanout"] http_retry_idempotency["http-retry-idempotency"] webhook_dedup_key["webhook-dedup-key"] http_retry_idempotency --> error_monitoring_fanout webhook_dedup_key --> http_retry_idempotency \`\`\` ``` Fail-closed broken-link check: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check nkb-graph check: 1 broken link(s): - webhook-dedup-key → nonexistent-upstream (no such pattern) $ echo $? 1 ``` Real `workflow-patterns/` directory on `main` (additive, doesn't regress): ``` $ node scripts/nkb-graph.mjs --check nkb-graph check: ok — 1 pattern(s), 0 edge(s), 0 broken ``` `tests/graph.bats` (10 cases, all green locally): ``` $ bats tests/graph.bats 1..10 ok 1 graph: mermaid output lists every pattern in the ok fixture as a node ok 2 graph: edges in mermaid mirror Depends-on declarations ok 3 graph --check: ok fixture passes with exit 0 ok 4 graph --check: broken fixture fails with exit 1 and names the missing slug ok 5 graph --check --format json: ok fixture reports ok:true, broken:[] ok 6 graph --check --format json: broken fixture reports the from/to edge ok 7 graph: --format json (no --check) dumps every pattern with its deps list ok 8 graph: pattern with no Depends-on section produces zero edges from it ok 9 graph: missing --dir target exits non-zero with stderr message ok 10 graph: mermaid output marks missing dep edges with dotted-arrow annotation ``` The broken-link check (#4) is the central-promise test: it asserts both the non-zero exit and that the offending edge (`webhook-dedup-key → nonexistent-upstream`) is named verbatim in the failure output. ### Round-1 / round-2 dependencies Sits next to the other `scripts/nkb-*.mjs` round-2 deliverables — submit (#13), JSON-LD export (#14), dedupe (#15), sandbox runner (#16), adoption stats (#17) — and will slot under the unified `scripts/nkb.mjs` dispatcher established by round-1 #3 once that lands on `main`. Until then it is invokable standalone as `node scripts/nkb-graph.mjs`. Citing round-1 #5 specifically because that PR establishes the precedent of a fail-closed CI lint over `workflow-patterns/*.md`; this PR extends that pattern along the internal-link axis. ## Test plan - [x] `bats tests/graph.bats` — all 10 outcome tests pass - [x] `node scripts/nkb-graph.mjs --check` against real `workflow-patterns/` exits 0 - [x] `node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check` exits 1 and names the bad edge - [ ] Reviewer confirms `## Depends-on` schema is acceptable to bolt onto existing pattern docs - [ ] Reviewer decides whether to wire `--check` into `.github/workflows/lint.yml` alongside the round-1 #5 citation lint Co-authored-by: Cody Arnold <cody@wranngle.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Adds `scripts/nkb-graph.mjs` — emits a Mermaid graph of pattern→dependency edges parsed from a `## Depends-on` section in each `workflow-patterns/*.md` doc. The `--check` flag exits non-zero when any pattern references a slug with no matching `.md` file in the scanned directory. This is the **structural** half of the citation-graph convention started in round-1 PR #5 (`feat(nkb): citation lint + Sources backfill on pattern docs`): PR #5 made every doc cite its external sources via `## Sources`; this PR makes every doc declare its internal cross-pattern dependencies via `## Depends-on`, and proves they all resolve. Both lints share the same fail-closed posture so CI can wire either as a required check. The schema is minimal — `## Depends-on` is followed by a bullet list of slugs, one per line, where each slug is the basename of a sibling `.md` in the same directory. A pattern with no dependencies either omits the section entirely or leaves it empty. The existing `voice-agent-elevenlabs- patterns.md` doc parses cleanly under this schema with zero edges, so the new convention is strictly additive — no backfill required on `main`. ### Fixtures `fixtures/graph/{ok,broken}/` follows the same convention as the dedupe fixture set (#15) and the stats fixture (#17): - `ok/` — three-pattern chain (`webhook-dedup-key → http-retry-idempotency → error-monitoring-fanout`). All edges resolve. - `broken/` — same chain plus a dangling edge to `nonexistent-upstream`. This is the negative-path fixture for `--check`. ### Proof Mermaid render of the ok fixture: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/ok \`\`\`mermaid graph LR error_monitoring_fanout["error-monitoring-fanout"] http_retry_idempotency["http-retry-idempotency"] webhook_dedup_key["webhook-dedup-key"] http_retry_idempotency --> error_monitoring_fanout webhook_dedup_key --> http_retry_idempotency \`\`\` ``` Fail-closed broken-link check: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check nkb-graph check: 1 broken link(s): - webhook-dedup-key → nonexistent-upstream (no such pattern) $ echo $? 1 ``` Real `workflow-patterns/` directory on `main` (additive, doesn't regress): ``` $ node scripts/nkb-graph.mjs --check nkb-graph check: ok — 1 pattern(s), 0 edge(s), 0 broken ``` `tests/graph.bats` (10 cases, all green locally): ``` $ bats tests/graph.bats 1..10 ok 1 graph: mermaid output lists every pattern in the ok fixture as a node ok 2 graph: edges in mermaid mirror Depends-on declarations ok 3 graph --check: ok fixture passes with exit 0 ok 4 graph --check: broken fixture fails with exit 1 and names the missing slug ok 5 graph --check --format json: ok fixture reports ok:true, broken:[] ok 6 graph --check --format json: broken fixture reports the from/to edge ok 7 graph: --format json (no --check) dumps every pattern with its deps list ok 8 graph: pattern with no Depends-on section produces zero edges from it ok 9 graph: missing --dir target exits non-zero with stderr message ok 10 graph: mermaid output marks missing dep edges with dotted-arrow annotation ``` The broken-link check (#4) is the central-promise test: it asserts both the non-zero exit and that the offending edge (`webhook-dedup-key → nonexistent-upstream`) is named verbatim in the failure output. ### Round-1 / round-2 dependencies Sits next to the other `scripts/nkb-*.mjs` round-2 deliverables — submit (#13), JSON-LD export (#14), dedupe (#15), sandbox runner (#16), adoption stats (#17) — and will slot under the unified `scripts/nkb.mjs` dispatcher established by round-1 #3 once that lands on `main`. Until then it is invokable standalone as `node scripts/nkb-graph.mjs`. Citing round-1 #5 specifically because that PR establishes the precedent of a fail-closed CI lint over `workflow-patterns/*.md`; this PR extends that pattern along the internal-link axis. ## Test plan - [x] `bats tests/graph.bats` — all 10 outcome tests pass - [x] `node scripts/nkb-graph.mjs --check` against real `workflow-patterns/` exits 0 - [x] `node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check` exits 1 and names the bad edge - [ ] Reviewer confirms `## Depends-on` schema is acceptable to bolt onto existing pattern docs - [ ] Reviewer decides whether to wire `--check` into `.github/workflows/lint.yml` alongside the round-1 #5 citation lint Co-authored-by: Cody Arnold <cody@wranngle.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Adds `scripts/nkb-graph.mjs` — emits a Mermaid graph of pattern→dependency edges parsed from a `## Depends-on` section in each `workflow-patterns/*.md` doc. The `--check` flag exits non-zero when any pattern references a slug with no matching `.md` file in the scanned directory. This is the **structural** half of the citation-graph convention started in round-1 PR #5 (`feat(nkb): citation lint + Sources backfill on pattern docs`): PR #5 made every doc cite its external sources via `## Sources`; this PR makes every doc declare its internal cross-pattern dependencies via `## Depends-on`, and proves they all resolve. Both lints share the same fail-closed posture so CI can wire either as a required check. The schema is minimal — `## Depends-on` is followed by a bullet list of slugs, one per line, where each slug is the basename of a sibling `.md` in the same directory. A pattern with no dependencies either omits the section entirely or leaves it empty. The existing `voice-agent-elevenlabs- patterns.md` doc parses cleanly under this schema with zero edges, so the new convention is strictly additive — no backfill required on `main`. ### Fixtures `fixtures/graph/{ok,broken}/` follows the same convention as the dedupe fixture set (#15) and the stats fixture (#17): - `ok/` — three-pattern chain (`webhook-dedup-key → http-retry-idempotency → error-monitoring-fanout`). All edges resolve. - `broken/` — same chain plus a dangling edge to `nonexistent-upstream`. This is the negative-path fixture for `--check`. ### Proof Mermaid render of the ok fixture: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/ok \`\`\`mermaid graph LR error_monitoring_fanout["error-monitoring-fanout"] http_retry_idempotency["http-retry-idempotency"] webhook_dedup_key["webhook-dedup-key"] http_retry_idempotency --> error_monitoring_fanout webhook_dedup_key --> http_retry_idempotency \`\`\` ``` Fail-closed broken-link check: ``` $ node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check nkb-graph check: 1 broken link(s): - webhook-dedup-key → nonexistent-upstream (no such pattern) $ echo $? 1 ``` Real `workflow-patterns/` directory on `main` (additive, doesn't regress): ``` $ node scripts/nkb-graph.mjs --check nkb-graph check: ok — 1 pattern(s), 0 edge(s), 0 broken ``` `tests/graph.bats` (10 cases, all green locally): ``` $ bats tests/graph.bats 1..10 ok 1 graph: mermaid output lists every pattern in the ok fixture as a node ok 2 graph: edges in mermaid mirror Depends-on declarations ok 3 graph --check: ok fixture passes with exit 0 ok 4 graph --check: broken fixture fails with exit 1 and names the missing slug ok 5 graph --check --format json: ok fixture reports ok:true, broken:[] ok 6 graph --check --format json: broken fixture reports the from/to edge ok 7 graph: --format json (no --check) dumps every pattern with its deps list ok 8 graph: pattern with no Depends-on section produces zero edges from it ok 9 graph: missing --dir target exits non-zero with stderr message ok 10 graph: mermaid output marks missing dep edges with dotted-arrow annotation ``` The broken-link check (#4) is the central-promise test: it asserts both the non-zero exit and that the offending edge (`webhook-dedup-key → nonexistent-upstream`) is named verbatim in the failure output. ### Round-1 / round-2 dependencies Sits next to the other `scripts/nkb-*.mjs` round-2 deliverables — submit (#13), JSON-LD export (#14), dedupe (#15), sandbox runner (#16), adoption stats (#17) — and will slot under the unified `scripts/nkb.mjs` dispatcher established by round-1 #3 once that lands on `main`. Until then it is invokable standalone as `node scripts/nkb-graph.mjs`. Citing round-1 #5 specifically because that PR establishes the precedent of a fail-closed CI lint over `workflow-patterns/*.md`; this PR extends that pattern along the internal-link axis. ## Test plan - [x] `bats tests/graph.bats` — all 10 outcome tests pass - [x] `node scripts/nkb-graph.mjs --check` against real `workflow-patterns/` exits 0 - [x] `node scripts/nkb-graph.mjs --dir fixtures/graph/broken --check` exits 1 and names the bad edge - [ ] Reviewer confirms `## Depends-on` schema is acceptable to bolt onto existing pattern docs - [ ] Reviewer decides whether to wire `--check` into `.github/workflows/lint.yml` alongside the round-1 #5 citation lint Co-authored-by: Cody Arnold <cody@wranngle.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Adds `scripts/nkb-stats.mjs` — a small CLI that reads a JSONL telemetry stream (default `fixtures/telemetry-sample.jsonl`) and prints the top-N knowledge-base patterns by view count. Default top-10, supports `--input`, `--top`, and `--format text|json`. Output is deterministic: ranked by count descending with an alphabetical tie-break on the pattern slug, so test assertions can pin row order even when two patterns share a count. Malformed JSON lines and records without a `pattern` field are skipped into a `skipped` meta counter rather than aborting the run. Round-2 features in this PR queue collectively give the knowledge base a write path (#13 submit), a machine-readable export (#14 jsonld), a dedupe gate (#15), a try-before-you-merge sandbox (#16), and now a read-out of which patterns are actually getting used. ### Proof Running the CLI against the bundled fixture: ``` $ node scripts/nkb-stats.mjs # nkb adoption stats — top 10 of 12 patterns # events tallied: 58 1. 10 slack-retry-storm 2. 9 http-retry-idempotency 3. 8 webhook-dedup-key 4. 7 voice-agent-elevenlabs-patterns 5. 6 stripe-idempotency-key 6. 5 airtable-rate-limit-backoff 7. 4 queue-backpressure-fanout 8. 3 error-monitoring-fanout 9. 2 google-sheets-batched-append 10. 2 shopify-orders-webhook ``` `tests/stats.bats` (8 cases, all green locally): ``` $ bats tests/stats.bats 1..8 ok 1 stats: top-10 ordering matches highest-count patterns from telemetry fixture ok 2 stats: rank 1 is the pattern with the most events (slack-retry-storm @ 10) ok 3 stats: exactly 10 ranked rows when --top 10 with 12 distinct patterns ok 4 stats: ties at count=2 break alphabetically (google-sheets... before shopify-...) ok 5 stats: header reports total events tallied (58 in fixture) ok 6 stats --format json: emits parseable JSON with descending counts ok 7 stats: malformed JSON lines and records lacking pattern are skipped, not fatal ok 8 stats: missing input file exits non-zero with message on stderr ``` The ordering test (#1) is the central-promise check: rank 1 is the highest-count pattern, rank 10 is the lowest of the top-10, and ties are resolved deterministically. The JSON-format test verifies the rows are monotonically descending so downstream consumers can trust the contract regardless of renderer. ### Round-1 dependency Designed to slot into the unified `scripts/nkb.mjs` dispatcher established in round-1 PR #3 (`feat(nkb): local full-text search CLI over knowledge base`). Until #3 re-lands on `main`, this script is invokable standalone via `node scripts/nkb-stats.mjs`; once the dispatcher merges, it becomes the `nkb stats` subcommand sitting next to `nkb search`, `nkb submit` (#13), `nkb export` (#14), `nkb dedupe` (#15), and `nkb run` (#16). ## Test plan - [x] `bats tests/stats.bats` — all 8 outcome tests pass - [x] `node scripts/nkb-stats.mjs` prints the top-10 ordering shown above - [x] `node scripts/nkb-stats.mjs --format json --top 3` emits valid JSON - [x] `node scripts/nkb-stats.mjs --input /no/such/file` exits non-zero - [ ] Reviewer spot-checks tie-break ordering matches alphabetical on equal counts - [ ] Reviewer confirms the fixture is in `fixtures/` (consistent with #13 / #15) --------- Co-authored-by: Cody Arnold <cody@wranngle.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Round-2 feature #2 for
n8n_knowledge_base: every pattern inworkflow-patterns/now has a machine-readable schema.org/TechArticleprojection at
dist/jsonld/<slug>.jsonld, so downstream LLM agents caningest the corpus through structured-data instead of markdown scraping.
scripts/nkb-export.mjs --jsonldis the entrypoint; designed to pluginto the unified
scripts/nkb.mjsdispatcher once round-1 #3 re-lands,runs standalone in the meantime.
Subcommands:
nkb export --jsonld— write/overwrite jsonld for every pattern.nkb export --jsonld --clean— wipedist/jsonld/first.nkb export --check— exit non-zero when any pattern lacks a jsonld(drift detector for CI).
Frontmatter (when present on a pattern, e.g. submissions from
#13) projects to:
titlename+headlinetagskeywordssubmitterauthor(Person)submitted_atdateCreatednamefallback whentitleabsentdescriptionProof
tests/jsonld.bats(6/6 green) on a freshfeat/r2-nkb-jsonldbranchfrom
main:Sample emitted document:
{ "@context": "https://schema.org", "@type": "TechArticle", "@id": "urn:nkb:pattern:voice-agent-elevenlabs-patterns", "identifier": "voice-agent-elevenlabs-patterns", "name": "Voice Agent & ElevenLabs Workflow Patterns", "headline": "Voice Agent & ElevenLabs Workflow Patterns", "description": "ElevenLabs has no native n8n node. All integrations use HTTP Request nodes to the ElevenLabs API. Voice agent workflows follow consistent architectural patterns across the community.", "inLanguage": "en", "keywords": [], "isPartOf": { "@type": "Dataset", "name": "wranngle/n8n_knowledge_base", "url": "https://github.com/wranngle/n8n_knowledge_base" }, "sourceFile": "workflow-patterns/voice-agent-elevenlabs-patterns.md", "wordCount": 1200, "proficiencyLevel": "Expert" }Round-1 dependency
This PR depends on the dispatcher conventions established in round-1
#3 (
feat(nkb): local full-text search CLI over workflow-patterns + technical-research).That commit established
scripts/nkb.mjsas the unified CLI dispatcherunder which subcommands (
search,submit,export, ...) plug in.The dispatcher was reverted by an unrelated "unified update" commit; this
script is callable standalone (
node scripts/nkb-export.mjs --jsonld)until #3's dispatcher re-lands, at which point wiring up
nkb export --jsonldis a one-line addition.Scope discipline
Touches exactly the feature's planned surface:
scripts/nkb-export.mjs(the exporter — corresponds to plan'ssrc/export/jsonld.ts+src/cli/export.ts, written in thescripts/*.mjsconvention that the repo actually uses).tests/jsonld.bats(the bats outcome suite — exactly the plan'stests/jsonld.bats).No README edits, no gitignore edits, no unrelated reformatting.
Test plan
bats tests/jsonld.batsgreen (6/6).node scripts/nkb-export.mjs --jsonldwritesvalid JSON to
dist/jsonld/.--checkmode returns non-zero when a pattern lacks a jsonld.nkb submitwrites per#13) project
tags/submitter/submitted_atinto the document correctly.🤖 Generated with Claude Code