transformers-ai-provider

Run Hugging Face models locally with the Vercel AI SDK — no API keys, no cloud, no cost.

This package wraps Transformers.js (ONNX Runtime) into a fully-compatible LanguageModelV3 provider, giving you generateText, streamText, and tool calling — all running on your own machine.

npm install transformers-ai-provider @huggingface/transformers ai

Quick Start

import { createTransformersModel } from 'transformers-ai-provider';
import { generateText } from 'ai';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const { text } = await generateText({
  model,
  prompt: 'Explain quantum computing in one sentence.',
});

console.log(text);

Streaming

import { createTransformersModel } from 'transformers-ai-provider';
import { streamText } from 'ai';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const result = streamText({
  model,
  prompt: 'Write a haiku about TypeScript.',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Tool Calling

Works with the standard Vercel AI SDK tool() helper. The provider injects tool definitions into the system prompt and parses structured JSON tool calls from the model output.

import { createTransformersModel } from 'transformers-ai-provider';
import { streamText, tool, stepCountIs } from 'ai';
import { z } from 'zod';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const result = streamText({
  model,
  system: 'You are a helpful assistant.',
  prompt: "What's the weather in Amsterdam?",
  tools: {
    weather: tool({
      description: 'Get the current weather for a location',
      inputSchema: z.object({
        location: z.string(),
        unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ location, unit }) => ({
        location,
        temperature: unit === 'celsius' ? 18 : 64,
        condition: 'partly cloudy',
      }),
    }),
  },
  stopWhen: stepCountIs(3),
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Configuration

createTransformersModel({
  // Required — any ONNX model from Hugging Face Hub
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',

  // Quantization level (default: 'q4')
  dtype: 'q4',

  // Device: 'cpu', 'webgpu', etc.
  device: 'cpu',

  // Extra options passed to AutoModelForCausalLM.from_pretrained
  modelOptions: {},

  // Extra options passed to AutoTokenizer.from_pretrained
  tokenizerOptions: {},

  // Custom template for injecting tool definitions into the system prompt
  toolPromptTemplate: (toolDefs) => `\n\nTools:\n${toolDefs}`,
});

Options

Option	Type	Default	Description
`modelId`	`string`	—	HF model ID (e.g. `onnx-community/gemma-3-1b-it-ONNX`)
`dtype`	`string`	`'q4'`	Quantization dtype (`'q4'`, `'q8'`, `'fp16'`, `'fp32'`)
`device`	`string`	auto	Runtime device (`'cpu'`, `'webgpu'`)
`modelOptions`	`object`	`{}`	Passed to `AutoModelForCausalLM.from_pretrained`
`tokenizerOptions`	`object`	`{}`	Passed to `AutoTokenizer.from_pretrained`
`toolPromptTemplate`	`function`	built-in	Custom tool prompt injection template

Supported Models

Any ONNX-format causal language model on Hugging Face Hub works. Some tested models:

Model	ID	Size
Gemma 3 1B Instruct	`onnx-community/gemma-3-1b-it-ONNX`	~700 MB (q4)
Phi-3 Mini	`onnx-community/Phi-3-mini-4k-instruct-onnx-web`	~2.3 GB (q4)
SmolLM 135M	`HuggingFaceTB/SmolLM-135M-Instruct`	~270 MB

How It Works

Model loading — Uses AutoTokenizer and AutoModelForCausalLM from Transformers.js with lazy initialization (loaded once, reused across calls).
Prompt conversion — Translates Vercel AI SDK's LanguageModelV3Prompt (system/user/assistant/tool messages) into HuggingFace chat format, applying the model's chat template via tokenizer.apply_chat_template.
Tool calling — When tools are provided, their JSON Schema definitions are injected into the system prompt. The model is instructed to respond with {"name": "...", "arguments": {...}} for tool calls. The provider parses these from the output and emits proper tool-call / tool-input-* stream events.
Streaming — Uses TextStreamer from Transformers.js to push token-by-token deltas into a ReadableStream<LanguageModelV3StreamPart>, compatible with streamText().

Runtimes

Works anywhere Transformers.js runs:

Node.js ≥ 18
Bun
Deno
Edge runtimes (with ONNX Runtime Web)
Browsers (with WebGPU for acceleration)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transformers-ai-provider

Quick Start

Streaming

Tool Calling

Configuration

Options

Supported Models

How It Works

Runtimes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

transformers-ai-provider

Quick Start

Streaming

Tool Calling

Configuration

Options

Supported Models

How It Works

Runtimes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages