Synthcore

Inductive program synthesis with library learning, for JS/TS. No LLM, no GPU.

Give it input→output examples and synthcore returns a verified function (or {ok:false}). It never hallucinates: it only returns a program that passes all your examples. Zero marginal cost, deterministic, auditable, offline.

import { synthesize } from "synthcore";

const r = await synthesize([
  { input: '{"email":"a@b.com"}', output: "a@b.com" },
  { input: '{"email":"c@d.com"}', output: "c@d.com" },
  { input: '{"email":"e@f.com"}', output: "e@f.com" },
]);

if (r.ok) {
  console.log(r.recipe); // "get(parseJSON(arg0),\"email\")"  ← real recipe, verified
  console.log(r.code);   // a standalone `solve` function that passes the 3 examples
}

r is { ok:true, code, recipe, size, entry, ast } or { ok:false, reason }. code is a standalone JS function ready to run; recipe is its readable form; ast is the recipe as a tree (consumed by learn()); reason ∈ "bad_input" | "not_found" | "unverified".

What it solves / what it doesn't

Enumerative search over a bounded DSL: it shines at small, composable transformations over numbers, strings, arrays and JSON. It is not general-purpose code generation.

✅ Solves today (verified in demo/tests)	❌ Out of scope today
Scalar arithmetic — `double` → `add(arg0,arg0)`	Large programs / application logic (it won't write an app)
Strings — `reverse` → `rev(arg0)`, `UPPERCASE` → `upper(arg0)`	Filtering / dedup: the `filter`/`unique` combinators are missing
JSON parsing — `JSON field` → `get(parseJSON(arg0),"email")`	Arbitrary recursion (e.g. Fibonacci) / deep structures
List reduction — `sum` → `sum(arg0)`, `average` → `idiv(sum(arg0),len(arg0))`	Algorithms with deep recursion or backtracking
Lists — `sort` → `sort(arg0)`, `max` → `max(arg0)`, `map`/`zipWith`	Transformations depending on external state or I/O
Dates/numbers/regex via the opt-in `std` bundle — `year` → `year(arg0)`, `"$1,234.50"` → `num(arg0)`	Large multi-step programs (enumerative budget)

For the right column, the path is to seed a primitive (extraPrims, e.g. proposed by an LLM) or raise the search budget — it's not a wall, it's a frontier that moves.

Use cases

Synthcore shines wherever you have examples of a small, repeatable transform and need it guaranteed correct, cheap and offline — not a plausible guess you still have to test:

Data cleaning / ETL at scale. Normalize a messy column ("$1,234.50" → 1234.5) once, then run it over millions of rows at zero per-call cost, offline (no data leaves your machine). Transpile to Python for your pipeline.
Kill the regex you were about to google. Extract a field by example instead of by pattern ("ERR-404: not found" → "404"). You describe the what, not the how.
Schema mapping / API glue. Turn "their payload → your payload" examples into a verified extractor; commit it, run it in CI, no external API in the loop.
Recover an exact formula from data. Verified symbolic regression (½mv², pH) for unit conversions, pricing rules and discrete laws — exact, not approximated.
A verified tool for your agents. Via MCP, an LLM can call Synthcore for a guaranteed transform instead of hallucinating one.

How it works

Typed bottom-up search over a DSL of combinators (n-ary application, map, fold, zipWith, constants), with observational-equivalence pruning (dedup by the output vector) → the exponential pruning that makes it tractable.
Deterministic verification in a hardened subprocess (no env, memory cap, temporary cwd, hard kill on timeout). Only code that passes all examples is returned.
Library learning (wake/sleep, DreamCoder/LILO style): mines recurring sub-programs, generalizes them into abstractions ranked by compression (MDL) and reuses them → solves deeper with the same budget.
Optional LLM hybrid: when pure synthesis fails, an LLM can propose one new primitive (extraPrims); the engine verifies it and, if it unlocks the solution, recomposes it for free forever. The LLM introduces the rare, expensive bit; the engine recomposes deterministically and at $0.

API

synthesize(examples, opts?) → Promise<{ ok:true, code, recipe, size, entry, ast } | { ok:false, reason }>

examples: a list of { input, output } | { in, out } | { args:[...], expect }. An array input is one list argument (not varargs). At least 3–5 examples recommended.
opts: { entry?, tools?, rounds?, maxEvals?, extraPrims?, std?, verify? }.

Rich types (opt-in): dates, numbers in text, regex

The standard primitives are not in the base DSL — they enlarge the search and raise the overfitting risk — so they are enabled with { std: true } (or by passing extraPrims: stdPrims):

await synthesize([{ input: "2024-03-15", output: 2024 }, /* … */], { std: true });  // → year(arg0)
await synthesize([{ input: "$1,234.50", output: 1234.5 }, /* … */], { std: true });  // → num(arg0)

stdPrims = datePrims (year/month/day/weekday, in UTC) + numberPrims (num/digits) + regexPrims (regexExtract/regexMatch).

Science domain bundles (opt-in): physics & chemistry laws

Physical and chemical laws ship as injectable primitives (physicsPrims, chemPrims, or both via sciencePrims). Give it data and it recovers the exact law — verified symbolic regression, like PySR/Eureqa but with an exact guarantee instead of a minimized error:

import { synthesize, physicsPrims, chemPrims } from "synthcore";

await synthesize([{ args: [2, 3], expect: 9 }, { args: [4, 5], expect: 50 }, /* … */],
  { extraPrims: physicsPrims });                      // → kinetic(arg0,arg1)   (½·m·v²)

await synthesize([{ input: 0.01, output: 2 }, { input: 0.001, output: 3 }, /* … */],
  { extraPrims: chemPrims });                          // → pH(arg0)             (-log₁₀[H⁺])

Honest frontier: for noisy experimental data use PySR (it minimizes error); for exact transformations (discrete laws, unit conversions, identities) Synthcore wins (exact verification). Note the engine only composes arity-1/2 primitives in its rounds, so arity-3 laws (e.g. idealGasP = nRT/V) only synthesize when the task itself has that arity.

Multi-language output: transpile to Python

The engine synthesizes a language-independent AST (Recipe); synthesize lowers it to JS, and emitPython lowers the same verified AST to standalone Python — no extra search, just an output pass:

import { synthesize, emitPython } from "synthcore";

const r = await synthesize([{ input: [1, 2, 3], output: 6 }, { input: [10, 20], output: 30 }]);
if (r.ok) emitPython(r.ast);   // → "def solve(*a):\n    return sum(a[0])"

Pass the same tools/extraPrims/std you gave synthesize. Only the base DSL and the bundles we maintain (std, domain) transpile; a learned abstraction or an arbitrary injected extraPrim (raw JS) throws a clear error instead of emitting incorrect code.

Learn a library that grows with use

learn() mines reusable abstractions from solutions you already found (DreamCoder/LILO, MDL ranking). Pass them back as tools and the engine solves deeper with the same budget. serializeLibrary / loadLibrary persist them across sessions (plain JSON):

const r1 = await synthesize(examplesA);
const r2 = await synthesize(examplesB);
const lib = await learn([r1, r2]);                  // mine verified abstractions
const r3 = await synthesize(examplesC, { tools: lib.tools }); // reuse what was learned

const json = serializeLibrary(lib.tools);           // persist
const tools = loadLibrary(json);                     // restore in another session

LLM-seeder hybrid

When pure synthesis fails, an LLM proposes one primitive (extraPrims); the engine verifies it and recomposes it for free. Full copy-paste example: examples/hybrid-llm-seeding.ts.

Advanced surface (composition, independent verification): solveBySynthesis, learnAbstractions, grade (verifier), buildOps, configureSearch, and the types Tool / Recipe / Op / Abstraction.

Use it from an agent (MCP)

synthcore-mcp is a Model Context Protocol server that exposes synthesize as a tool, so agents (Claude Desktop, Cursor, …) can get a deterministic, verified, $0 data transform instead of hallucinating one. The LLM reasons; Synthcore guarantees.

cd mcp && npm install && npm start   # stdio server with a single `synthesize` tool

Then point your client at it and call synthesize with your examples (supports the opt-in bundles and language: "python"). Full wiring for Claude Desktop / Cursor in mcp/README.md.

Limitations (read them — honest selling avoids the hype that burns)

It is not general code generation. Enumerative search only reaches small programs in a bounded DSL; it does not write an application nor reason over natural-language specs.
It complements an LLM, it doesn't replace it. The model is the hybrid (LLM proposes rare primitives; Synthcore recomposes them for free and verified), not a head-on competitor.
With few examples it can overfit. It returns some program that passes what you gave it; with 1–2 examples that may be a coincidence. Real case: for "email domain" with 3 examples the engine finds max(split(capitalize(arg0),"@")) — it passes by accident (the domain wins the lexicographic order), not because it "understands" emails. Give ≥3–5 examples with edge cases so the solution truly generalizes.
Bounded DSL. Dates, regex, numbers-in-text and the science bundles (physics/chemistry) exist but are opt-in ({ std: true } or extraPrims), because each primitive enlarges the search. The filter/unique combinators are still missing and there is no recursion, so filtering, dedup or Fibonacci return not_found. It's the current DSL frontier (it moves by seeding primitives), not a hidden case.
Transpilation is template-based, not a full compiler. emitPython covers the base DSL and the bundles we maintain. Learned abstractions and arbitrary injected primitives (raw JS) are not transpiled — it throws rather than emit code it can't guarantee.
Niche, not mass. The audience is developers/tooling, not end consumers.

Positioning

vs Microsoft PROSE / FlashFill (mature PBE): Synthcore adds library learning + deep composition, and is embeddable, open TS (PROSE is C#/closed and string/table-centric).
vs DreamCoder / LILO (academic, Python + GPU): Synthcore is a TS project that runs on a laptop without a GPU.
vs LLM codegen (Copilot/Cursor): deterministic, verified, $0, offline, auditable — the other half of the stack.

Usage

npm install synthcore   # in your project

Repo development:

npm install        # devDeps only (typescript, tsx, @types/node)
npm run demo       # solves 8/8 example tasks, verified, $0
npm run bench      # honest benchmark: 13/15 tasks, ~150 ms per solved task, $0
npm test           # contract suite (node:test)
npm run typecheck  # tsc --noEmit
npm run build      # compiles the library to dist/ (JS + types) for publishing

Requires Node ≥ 22. Pure ESM, no runtime dependencies.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
examples		examples
mcp		mcp
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthcore

What it solves / what it doesn't

Use cases

How it works

API

Rich types (opt-in): dates, numbers in text, regex

Science domain bundles (opt-in): physics & chemistry laws

Multi-language output: transpile to Python

Learn a library that grows with use

LLM-seeder hybrid

Use it from an agent (MCP)

Limitations (read them — honest selling avoids the hype that burns)

Positioning

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthcore

What it solves / what it doesn't

Use cases

How it works

API

Rich types (opt-in): dates, numbers in text, regex

Science domain bundles (opt-in): physics & chemistry laws

Multi-language output: transpile to Python

Learn a library that grows with use

LLM-seeder hybrid

Use it from an agent (MCP)

Limitations (read them — honest selling avoids the hype that burns)

Positioning

Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages