Skip to content

hackdonalds/ai-sdk-transformers

Repository files navigation

transformers-ai-provider

Run Hugging Face models locally with the Vercel AI SDK — no API keys, no cloud, no cost.

This package wraps Transformers.js (ONNX Runtime) into a fully-compatible LanguageModelV3 provider, giving you generateText, streamText, and tool calling — all running on your own machine.

npm install transformers-ai-provider @huggingface/transformers ai

Quick Start

import { createTransformersModel } from 'transformers-ai-provider';
import { generateText } from 'ai';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const { text } = await generateText({
  model,
  prompt: 'Explain quantum computing in one sentence.',
});

console.log(text);

Streaming

import { createTransformersModel } from 'transformers-ai-provider';
import { streamText } from 'ai';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const result = streamText({
  model,
  prompt: 'Write a haiku about TypeScript.',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Tool Calling

Works with the standard Vercel AI SDK tool() helper. The provider injects tool definitions into the system prompt and parses structured JSON tool calls from the model output.

import { createTransformersModel } from 'transformers-ai-provider';
import { streamText, tool, stepCountIs } from 'ai';
import { z } from 'zod';

const model = createTransformersModel({
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',
});

const result = streamText({
  model,
  system: 'You are a helpful assistant.',
  prompt: "What's the weather in Amsterdam?",
  tools: {
    weather: tool({
      description: 'Get the current weather for a location',
      inputSchema: z.object({
        location: z.string(),
        unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ location, unit }) => ({
        location,
        temperature: unit === 'celsius' ? 18 : 64,
        condition: 'partly cloudy',
      }),
    }),
  },
  stopWhen: stepCountIs(3),
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Configuration

createTransformersModel({
  // Required — any ONNX model from Hugging Face Hub
  modelId: 'onnx-community/gemma-3-1b-it-ONNX',

  // Quantization level (default: 'q4')
  dtype: 'q4',

  // Device: 'cpu', 'webgpu', etc.
  device: 'cpu',

  // Extra options passed to AutoModelForCausalLM.from_pretrained
  modelOptions: {},

  // Extra options passed to AutoTokenizer.from_pretrained
  tokenizerOptions: {},

  // Custom template for injecting tool definitions into the system prompt
  toolPromptTemplate: (toolDefs) => `\n\nTools:\n${toolDefs}`,
});

Options

Option Type Default Description
modelId string HF model ID (e.g. onnx-community/gemma-3-1b-it-ONNX)
dtype string 'q4' Quantization dtype ('q4', 'q8', 'fp16', 'fp32')
device string auto Runtime device ('cpu', 'webgpu')
modelOptions object {} Passed to AutoModelForCausalLM.from_pretrained
tokenizerOptions object {} Passed to AutoTokenizer.from_pretrained
toolPromptTemplate function built-in Custom tool prompt injection template

Supported Models

Any ONNX-format causal language model on Hugging Face Hub works. Some tested models:

Model ID Size
Gemma 3 1B Instruct onnx-community/gemma-3-1b-it-ONNX ~700 MB (q4)
Phi-3 Mini onnx-community/Phi-3-mini-4k-instruct-onnx-web ~2.3 GB (q4)
SmolLM 135M HuggingFaceTB/SmolLM-135M-Instruct ~270 MB

How It Works

  1. Model loading — Uses AutoTokenizer and AutoModelForCausalLM from Transformers.js with lazy initialization (loaded once, reused across calls).

  2. Prompt conversion — Translates Vercel AI SDK's LanguageModelV3Prompt (system/user/assistant/tool messages) into HuggingFace chat format, applying the model's chat template via tokenizer.apply_chat_template.

  3. Tool calling — When tools are provided, their JSON Schema definitions are injected into the system prompt. The model is instructed to respond with {"name": "...", "arguments": {...}} for tool calls. The provider parses these from the output and emits proper tool-call / tool-input-* stream events.

  4. Streaming — Uses TextStreamer from Transformers.js to push token-by-token deltas into a ReadableStream<LanguageModelV3StreamPart>, compatible with streamText().

Runtimes

Works anywhere Transformers.js runs:

  • Node.js ≥ 18
  • Bun
  • Deno
  • Edge runtimes (with ONNX Runtime Web)
  • Browsers (with WebGPU for acceleration)

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors