ai-tokenizer

Token counting and management for LLMs. Count tokens, truncate text, manage budgets. OpenAI, Anthropic, Llama support.

Quick Start

npx ai-tokenizer count "Hello, world!"

Features

Accurate token counting - Uses tiktoken for precise counts
Smart truncation - Truncate from end, start, or middle
Chunking - Split text by tokens, sentences, or paragraphs
Budget management - Allocate tokens across system/context/response
Statistics - Analyze compression ratios and token usage
Multi-model - GPT-4, Claude, Llama, Mistral support

Installation

# Use directly with npx (no install needed)
npx ai-tokenizer count "your text here"

# Or install globally
npm install -g ai-tokenizer

# Or add to your project
npm install ai-tokenizer

CLI Usage

Count Tokens

# Count tokens in text
npx ai-tokenizer count "Hello, how are you?"

# Count tokens in a file
npx ai-tokenizer count --file ./document.txt

# Specify model
npx ai-tokenizer count "Hello" --model gpt-3.5-turbo

# Count message tokens (JSON format)
npx ai-tokenizer count --messages '[{"role":"user","content":"Hi"}]'

Truncate Text

# Truncate to 100 tokens
npx ai-tokenizer truncate "long text..." --tokens 100

# Truncate from start
npx ai-tokenizer truncate "long text..." --tokens 100 --strategy start

# Truncate from middle
npx ai-tokenizer truncate "long text..." --tokens 100 --strategy middle

# Custom ellipsis
npx ai-tokenizer truncate "long text..." --tokens 100 --ellipsis " [...]"

Chunk Text

# Split into 1000-token chunks
npx ai-tokenizer chunk --file ./large-doc.txt --tokens 1000

# With overlap
npx ai-tokenizer chunk --file ./doc.txt --tokens 1000 --overlap 100

# Save chunks to files
npx ai-tokenizer chunk --file ./doc.txt --tokens 1000 --output ./chunks/part

Analyze Text

# Get token statistics
npx ai-tokenizer analyze "Your text here"

# Analyze a file
npx ai-tokenizer analyze --file ./document.txt

Compare Models

# Compare token counts across models
npx ai-tokenizer compare "Your text" --models gpt-4,claude-3-sonnet,llama-2-70b

List Models

# Show all supported models and context windows
npx ai-tokenizer models

Programmatic Usage

import {
  countTokens,
  countMessageTokens,
  truncateToTokens,
  chunkText,
  analyzeText,
  BudgetManager,
  getContextWindow,
} from 'ai-tokenizer';

// Count tokens
const tokens = countTokens("Hello, world!", "gpt-4");
console.log(tokens); // 4

// Count message tokens
const messageTokens = countMessageTokens([
  { role: "system", content: "You are helpful." },
  { role: "user", content: "Hi!" },
], "gpt-4");

// Truncate text
const truncated = truncateToTokens("very long text...", {
  maxTokens: 100,
  strategy: "end",
  ellipsis: "...",
});

// Chunk text
const chunks = chunkText(longDocument, {
  maxTokens: 1000,
  overlap: 100,
});

// Budget management
const budget = new BudgetManager(8000, "gpt-4");
budget.addSystemPrompt("You are a helpful assistant.");
budget.addContext(relevantDocs);
console.log(budget.getRemainingContext()); // tokens left for more context
console.log(budget.getMaxResponseTokens()); // tokens reserved for response

// Analyze text
const stats = analyzeText("Your text here");
console.log(stats.totalTokens);
console.log(stats.compressionRatio);

// Get context window
const contextWindow = getContextWindow("gpt-4-turbo"); // 128000

Supported Models

Model	Context Window
gpt-4-turbo	128,000
gpt-4	8,192
gpt-4-32k	32,768
gpt-3.5-turbo	16,385
claude-3-opus	200,000
claude-3-sonnet	200,000
claude-3-haiku	200,000
gemini-1.5-pro	1,000,000
mistral-large	32,000
llama-2-70b	4,096

API Reference

`countTokens(text, model?)`

Count tokens in text string.

`countMessageTokens(messages, model?)`

Count tokens in chat messages array (includes overhead).

`truncateToTokens(text, options)`

Truncate text to fit within token limit.

`chunkText(text, options)`

Split text into chunks of specified token size.

`chunkBySentence(text, maxTokens, model?)`

Split text by sentences, respecting token limit.

`chunkByParagraph(text, maxTokens, model?)`

Split text by paragraphs, respecting token limit.

`analyzeText(text, model?)`

Get token statistics for text.

`BudgetManager`

Class for managing token budgets across system/context/response.

`getContextWindow(model)`

Get context window size for a model.

`fitsInContext(text, model, reserveTokens?)`

Check if text fits in model's context window.

Part of the LXGIC Dev Toolkit

One of 110+ free developer tools from LXGIC Studios. No paywalls, no sign-ups.

Find more:

GitHub: https://github.com/lxgicstudios
Twitter: https://x.com/lxgicstudios
Website: https://lxgicstudios.com
npm: https://www.npmjs.com/~lxgicstudios

License

MIT. Free forever.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-tokenizer

Quick Start

Features

Installation

CLI Usage

Count Tokens

Truncate Text

Chunk Text

Analyze Text

Compare Models

List Models

Programmatic Usage

Supported Models

API Reference

`countTokens(text, model?)`

`countMessageTokens(messages, model?)`

`truncateToTokens(text, options)`

`chunkText(text, options)`

`chunkBySentence(text, maxTokens, model?)`

`chunkByParagraph(text, maxTokens, model?)`

`analyzeText(text, model?)`

`BudgetManager`

`getContextWindow(model)`

`fitsInContext(text, model, reserveTokens?)`

Part of the LXGIC Dev Toolkit

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-tokenizer

Quick Start

Features

Installation

CLI Usage

Count Tokens

Truncate Text

Chunk Text

Analyze Text

Compare Models

List Models

Programmatic Usage

Supported Models

API Reference

countTokens(text, model?)

countMessageTokens(messages, model?)

truncateToTokens(text, options)

chunkText(text, options)

chunkBySentence(text, maxTokens, model?)

chunkByParagraph(text, maxTokens, model?)

analyzeText(text, model?)

BudgetManager

getContextWindow(model)

fitsInContext(text, model, reserveTokens?)

Part of the LXGIC Dev Toolkit

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`countTokens(text, model?)`

`countMessageTokens(messages, model?)`

`truncateToTokens(text, options)`

`chunkText(text, options)`

`chunkBySentence(text, maxTokens, model?)`

`chunkByParagraph(text, maxTokens, model?)`

`analyzeText(text, model?)`

`BudgetManager`

`getContextWindow(model)`

`fitsInContext(text, model, reserveTokens?)`

Packages