Skip to content

owieschon/sku-resolver

Repository files navigation

SKU Resolution Engine

K4-12SBA and K5-12SBA are one keystroke apart and a physically different part. A counter rep won't catch the swap — it reads fine on the quote, it survives review, and it gets caught on the loading dock or by the customer, after the wrong part has already shipped. The lesson stuck: when a confident wrong answer has a price someone else pays, the model that's usually right is the wrong tool for the binding decision.

So here the binding decision is deterministic. A rep types or says "five inch chrome curved stack, twenty-four long" and the engine answers K5-24SBC — and when it can't be sure, it returns an honest pending or unresolvable instead of a plausible fake. A language model is wired in, but only as a proposer at the edges: it can narrow a retrieved candidate list, it never authors the part number of record.

The guarantee, and how it's proven

The engine cannot invent a part number. Every resolved answer points at a real catalog row, and that's re-proven over the whole catalog on every commit rather than asserted. scripts/roundtrip_audit.py derives the catalog size at runtime and enforces three hard gates:

  • Identity — 100%, no exceptions. Every catalog SKU must translate() back to exactly itself. Resolution is never allowed to rewrite one real part into another.
  • No new silent rewrites. Every SKU that construct(extract(sku)) rebuilds must rebuild identically, against an empty pinned-exception baseline — any new mismatch fails the build, because a silent rewrite is the dangerous case.
  • Round-trip floor — ≥95%. The fraction of the catalog that survives decode→re-encode unchanged; a drop signals a grammar or extractor regression.

On the current catalog that's identity 9,487/9,487, 0 silent rewrites, 96.96% round-trip, ~1s. A companion audit (scripts/noise_resilience_audit.py) perturbs real SKUs with typo / OCR / partial-input damage — 1,200 noisy inputs — and confirms 0 inventions: bad input degrades to pending, it is never resolved into the wrong part. Both run in CI beside ruff, mypy, and the suite.

The two files worth opening first: src/sku_translator/translator.py (the deterministic core — no LLM, no network in the resolution path) and scripts/roundtrip_audit.py (the audit above).

How it resolves

text ─▶ normalize ─▶ extract ─▶ ┌ verbatim · parser · construct ┐
"5in chrome 24 SB"    (spec)    │ fuzzy · memory · disambiguate  │ ─▶ result
                                └────────────────────────────────┘   (sku · state
                                                                       · source
                                                                       · confidence)

A hand-authored grammar of 312 compiled patterns decodes a canonical SKU into structured fields — family, diameter, length, finish, angle, legs — and runs in reverse to rebuild it. That invertibility is what makes the round-trip audit possible: the grammar is its own oracle. translate() walks the paths in priority order (verbatim catalog hit, then parser, construct, fuzzy, memory, disambiguate) and bails to unresolvable rather than guess. The optional LLM chooser (src/resolution/chooser.py) is bind-guarded to the retrieved candidate set and defaults to NoChooser — propose-only — so a hallucinated or empty pick is rejected and never-invent holds through the model.

Around that core sit a pure ship-date / fulfillment engine, an ERP-onboarding harness that induces an unknown tenant's grammar from its own catalog strings, and a chat/voice gateway where pricing stays gated behind account verification. Adversarial and tenant-isolation tests cover the seams.

Run it

pip install -e ".[dev]"
pytest                              # 68 test files, 603 tests (593 pass, 10 credential-gated skips)
python scripts/roundtrip_audit.py   # the never-invent audit, ~1s
from sku_translator import translate, FixtureCatalogIndex, InMemoryStore

catalog = FixtureCatalogIndex("data/catalog.csv", tenant_id="demo")
r = translate("5 inch chrome curved 24 long SB", catalog=catalog, memory=InMemoryStore())
print(r.sku, r.source, r.confidence)   # → K5-24SBC construct high

Status, honestly

This is the public, sanitized mirror of a real system. The catalog (data/catalog.csv, 9,487 resolvable SKUs) is synthetic — generated by enumerating the public grammar (scripts/generate_catalog.py) and keeping only SKUs that round-trip, so it carries no real company's part numbers, descriptions, prices, or customers. The grammar itself was originally hardened against a private NDA catalog that isn't included.

The deterministic core, the audits, the ship-date engine, and the resolution seams are production-grade and exercised by CI. The provider integrations (LLM, speech-to-text, Twilio) sit behind interfaces: their parsing/decision logic is tested with scripted implementations, but the live I/O only runs with real credentials and is skipped in CI — that's the 10 skips above. docs/MATURITY.md is the per-capability map (PROD / GATED / STUB), written so a reviewer can trust the rest of the repo.

Going deeper

License: Apache-2.0 — see LICENSE and NOTICE.

About

Deterministic SKU-resolution engine for an industrial parts catalog: a 312-pattern grammar decodes/reconstructs part numbers with no LLM in the resolution path, returns an honest pending/unresolvable instead of guessing, and re-proves the never-invent guarantee over the whole catalog every commit. The LLM only proposes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages