lloyal.node

Advanced edge inference for Node.js

A llama.cpp control surface in TypeScript with atomic inference state forking. Real time rolling perplexity/entropy/surprisal and multi-sequence parallel exploration primitives.

npm install @lloyal-labs/lloyal.node

Prebuilt binaries for 13 platforms:

Platform	Arch	Acceleration
macOS	arm64	Metal
macOS	x64	CPU
Linux	x64	CPU / CUDA / Vulkan
Linux	arm64	CPU / CUDA / Vulkan
Windows	x64	CPU / CUDA / Vulkan
Windows	arm64	CPU / Vulkan

GPU selection happens at runtime, not install time. See distribution.md for details.

Examples

Working examples demonstrate each capability:

Example	What It Demonstrates
`best-of-n/`	Branch API parallel generation, PPL selection, fork/produce/commit
`speculative/`	Branch API fork/prune, draft/verify/accept/reject, bonus token sampling
`entropy/`	Entropy Decision Tree — `modelEntropy()` mid-generation as control signal
`grammar/`	Pull loop with generators, JSON schema constraints, KV + grammar branching
`streaming/`	Infinite context via BlinkKV, `clearAndReseed`, perplexity tracking
`chat/`	Interactive streaming chat
`embed/`	Text embeddings extraction

node examples/best-of-n/best-of-n.mjs
node examples/speculative/speculative.mjs
node examples/entropy/entropy.mjs
node examples/grammar/grammar.mjs

Each example has a README explaining the pattern in depth.

Core Patterns

Branch API

Branch is the primary API for parallel generation. Each branch owns a KV cache sequence, sampler chain, logits snapshot, and perplexity tracker. Fork a branch to explore alternatives, compare by perplexity, prune losers.

import { createContext, Branch } from '@lloyal-labs/lloyal.node';

const ctx = await createContext({ modelPath: './model.gguf', nSeqMax: 8 });
const tokens = await ctx.tokenize('Once upon a time');
await ctx.decode(tokens, 0, 0);

// Create root branch, capture logits from prefill
const root = Branch.create(ctx, 0, tokens.length, { temperature: 0.8 });
root.captureLogits();

// Fork N candidates — each gets copied KV, logits, sampler, perplexity
const candidates = [1, 2, 3, 4, 5].map((seqId, i) => {
  const branch = root.fork(seqId);
  branch.reseedSampler(1000 + i); // Unique PRNG per branch
  return branch;
});

// Generate in parallel (interleaved round-robin)
for (let t = 0; t < 50; t++) {
  for (const branch of candidates) {
    const { token, isStop } = branch.produce(); // Sample (no KV write)
    if (isStop) continue;
    branch.commit(token); // Accept + decode + capture
  }
}

// Select best by perplexity, prune losers
const best = candidates.reduce((a, b) => a.perplexity < b.perplexity ? a : b);
for (const c of candidates) { if (c !== best) c.prune(); }

What fork() clones: KV cache sequence, logits snapshot, sampler chain (penalties + PRNG), perplexity tracker. Under unified KV (the default), forking is a metadata-only operation — no KV tensor buffers are copied.

Use cases: Best-of-N sampling, speculative decoding, MCTS/LATS tree search, beam search.

See examples/best-of-n/ and examples/speculative/ for complete patterns.

Low-Level Forking

For fine-grained control without the Branch wrapper, raw KV and state operations are available:

Approach	Method	Use Case
Tag copy	`kvSeqCopy(src, dst)`	Parallel branches with different seqIds
Snapshot/restore	`kvCacheSave()` / `kvCacheLoad()`	Sequential exploration, return to checkpoint

examples/grammar/ uses snapshot/restore — save state, explore branches sequentially, restore between each:

const snapshot = await ctx.kvCacheSave(0); // Save checkpoint
// ... explore branch ...
await ctx.kvCacheLoad(0, snapshot); // Return to checkpoint

Entropy as Control Signal

Model uncertainty mid-generation enables dynamic behavior:

const entropy = ctx.modelEntropy('bits');

if (entropy > 4.0) {
  // High uncertainty — model is guessing
  // Trigger retrieval, reduce temperature, or branch
}

See examples/entropy/ for entropy-triggered sampling strategies.

Pull Loop with Generators

For branching mid-generation, generators provide natural backpressure:

function* tokenGenerator(ctx, grammarHandle) {
  while (true) {
    const logits = ctx.getLogits();
    ctx.applySampler(grammarHandle, logits);
    const token = ctx.sample({ temperature: 0.7 });
    if (ctx.isStopToken(token)) return;
    ctx.acceptSamplerToken(grammarHandle, token);
    yield { token, text: ctx.tokenToText(token) };
  }
}

// Consumer controls pace — stop at branch point
for (const { token, text } of gen) {
  if (accumulated.includes('"city"')) break; // Pause here, branch
}

See examples/grammar/ for the full pull loop pattern.

API Reference

Context Creation

const ctx = await createContext({
  modelPath: string,       // Path to .gguf file (required)
  nCtx?: number,           // Context size (default: 2048)
  nThreads?: number,       // CPU threads (default: 4)
  embeddings?: boolean,    // Enable embedding mode (default: false)
  poolingType?: number,    // 0=NONE, 1=MEAN, 2=CLS, 3=LAST
  nSeqMax?: number,        // Max parallel sequences (default: 1)
});

Core Methods

Method	Returns	Description
`tokenize(text)`	`Promise<number[]>`	Text → token IDs
`detokenize(tokens)`	`Promise<string>`	Token IDs → text
`tokenToText(token)`	`string`	Single token → text (streaming)
`decode(tokens, pos, seqId?)`	`Promise<void>`	Forward pass, updates KV cache
`sample(params?)`	`number`	Sample next token
`isStopToken(token)`	`boolean`	Check for EOS token
`getLogits()`	`Float32Array`	Raw logits (zero-copy view)

KV Cache

Method	Returns	Description
`kvCacheSize(seqId?)`	`number`	Tokens in cache
`kvCacheClear()`	`Promise<void>`	Clear all sequences
`kvCacheRemove(seqId, start, end)`	`Promise<void>`	Remove token range
`kvCacheSave(seqId?)`	`Promise<Buffer>`	Snapshot state
`kvCacheLoad(seqId, state)`	`Promise<void>`	Restore state
`kvSeqCopy(src, dst)`	`void`	Copy sequence (tag copy, O(1))
`kvSeqKeep(seqId)`	`void`	Keep only one sequence
`clearAndReseed(sinks, tail)`	`Promise<void>`	BlinkKV pattern

Grammar (Handle-Based)

Method	Returns	Description
`jsonSchemaToGrammar(schema)`	`string`	Schema → GBNF
`createSampler(grammarStr)`	`number`	Create grammar handle
`cloneSampler(handle)`	`number`	Clone grammar state
`applySampler(handle, logits)`	`void`	Apply constraints to logits
`acceptSamplerToken(handle, id)`	`void`	Advance parser state
`freeSamplerHandle(handle)`	`void`	Release grammar handle

Metrics

Method	Returns	Description
`modelEntropy(base?, logits?)`	`number`	Distribution entropy (bits/nats)
`modelSurprisal(token, base?, logits?)`	`number`	Token surprisal (supports captured logits)
`createPerplexityTracker()`	`TrackerHandle`	Create tracker (forkable)
`clonePerplexityTracker(handle)`	`TrackerHandle`	Clone tracker state
`addSurprisal(handle, value)`	`void`	Add to tracker
`getPerplexity(handle)`	`number`	Get current PPL
`freePerplexityTracker(handle)`	`void`	Release tracker

Embeddings

Method	Returns	Description
`encode(tokens)`	`Promise<void>`	Forward pass for embeddings
`getEmbeddings(normalize?)`	`Float32Array`	Extract embedding vector
`getEmbeddingDimension()`	`number`	Vector dimension

Lifecycle

Method	Description
`dispose()`	Free native resources (required)

Ecosystem

Package	Runtime	Description
liblloyal	C++	Header-only inference kernel
lloyal.node	Node.js	This package
nitro-llama	React Native	Mobile bindings via Nitro Modules
tsampler	TypeScript	Reference sampler implementation

Contributing

See CONTRIBUTING.md for development setup and release process.

License

Apache 2.0 — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github		.github
cmake		cmake
docs		docs
examples		examples
lib		lib
liblloyal @ 0b4963d		liblloyal @ 0b4963d
llama.cpp @ 338074c		llama.cpp @ 338074c
packages/template		packages/template
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.npmignore		.npmignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lloyal.node

Examples

Core Patterns

Branch API

Low-Level Forking

Entropy as Control Signal

Pull Loop with Generators

API Reference

Context Creation

Core Methods

KV Cache

Grammar (Handle-Based)

Metrics

Embeddings

Lifecycle

Ecosystem

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lloyal-ai/lloyal.node

Folders and files

Latest commit

History

Repository files navigation

lloyal.node

Examples

Core Patterns

Branch API

Low-Level Forking

Entropy as Control Signal

Pull Loop with Generators

API Reference

Context Creation

Core Methods

KV Cache

Grammar (Handle-Based)

Metrics

Embeddings

Lifecycle

Ecosystem

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages