TokenOps

The operational intelligence layer for AI systems.

Observe. Optimize. Govern. Improve.

TokenOps is an open-source platform that sits between your clients, agents, and workflows and frontier LLM providers (OpenAI, Anthropic, Google Gemini). It turns opaque AI consumption into measurable, optimizable operational infrastructure: real-time token optimization, inference observability, agent usage analytics, prompt intelligence, forecasting, governance, and AI efficiency coaching.

Status: early development. The architecture is being assembled in public.

Why TokenOps

Token usage is becoming the dominant operational cost of modern software, yet today's tooling solves only isolated pieces of the problem:

Existing tools	What they miss
Prompt optimizers	No observability, no governance
Provider billing dashboards	No real-time optimization or attribution
Tracing/observability	No optimization, no coaching
Agent frameworks	No operational analytics or governance

TokenOps combines optimization, observability, forecasting, coaching, and governance into one integrated operational platform — the TokenOps discipline (think DevOps, FinOps, MLOps, AIOps for inference).

Architecture (high level)

Clients / SDKs / CLIs / Extensions
            |
            v
   Local TokenOps Proxy
            |
            v
   Optimization Engine
            |
            v
  Routing & Analysis Layer
            |
            v
       LLM Providers
            |
            v
   Telemetry Pipeline
            |
            v
 Observability + Coaching

Repository layout

tokenops/
  cmd/
    tokenops/         # CLI binary (cobra)
    tokenopsd/        # daemon binary (proxy + analytics)
  internal/
    proxy/            # reverse proxy + provider routing
    optimizer/        # optimization engine
    observability/    # analytics, spend, workflow trace
    coaching/         # replay, waste detection, recommendations
    forecasting/      # forecasting & budget alerts
    storage/          # SQLite + ClickHouse adapters
    events/           # event bus
    redaction/        # secret detection + redaction
    cli/              # CLI command implementations
    config/           # configuration loading
    version/          # build metadata
  pkg/
    eventschema/      # public event schemas
  web/                # Vue 3 dashboard
  docs/               # docs site sources
  deployments/        # docker-compose, helm, etc.
  scripts/            # dev scripts
  .github/workflows/  # CI

Getting started

The code base is in bootstrap state. The commands below describe the target developer experience and are being implemented incrementally.

# Build everything
make build

# Run the daemon locally (proxy + analytics)
./bin/tokenopsd start

# Point a client SDK at the local proxy
export OPENAI_BASE_URL=http://localhost:7878/v1
export ANTHROPIC_BASE_URL=http://localhost:7878
export GEMINI_BASE_URL=http://localhost:7878

# Replay an expensive session
./bin/tokenops replay <session-id>

# Inspect spend, forecast, burn rate
./bin/tokenops spend

5-minute operator golden path

This path is optimized for one goal: prove TokenOps can produce measurable value in minutes, not days.

Step 1: Start the daemon

./bin/tokenopsd start

Expected: the process stays running and prints a startup log with a listen address (default 127.0.0.1:7878).

Step 2: Point one SDK request at the proxy

export OPENAI_BASE_URL=http://127.0.0.1:7878/v1

Run one existing request from your app/CLI against the same model you already use in production.

Expected: request succeeds with no code changes other than base URL override.

Step 3: Validate attribution and spend visibility

./bin/tokenops spend

Expected: output includes non-zero usage and spend for the recent request.

Step 4: Replay and inspect optimization headroom

./bin/tokenops replay <session-id>

Expected: side-by-side analysis shows optimization opportunities and projected token/spend deltas for that session.

Step 5: Capture your wedge KPI baseline

Track one primary KPI before broad rollout:

Token efficiency uplift (%) = (baseline_tokens - optimized_tokens) / baseline_tokens * 100

Suggested target for initial rollout: 10-20% token reduction on high-volume workflows while preserving quality gates.

Why this KPI: it directly ties optimization behavior to cost control and gives an objective pass/fail metric for expansion decisions.

Contributing

See CONTRIBUTING.md and CODE_OF_CONDUCT.md.

Plans and tasks are tracked in .roady/ (see the roady spec-driven planning tool).

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github		.github
.roady		.roady
cmd		cmd
deployments/clickhouse		deployments/clickhouse
docs		docs
integrations		integrations
internal		internal
pkg/eventschema		pkg/eventschema
scripts		scripts
security		security
web		web
.coverctl.yaml		.coverctl.yaml
.editorconfig		.editorconfig
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
.nox.yaml		.nox.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenOps

Why TokenOps

Architecture (high level)

Repository layout

Getting started

5-minute operator golden path

Step 1: Start the daemon

Step 2: Point one SDK request at the proxy

Step 3: Validate attribution and spend visibility

Step 4: Replay and inspect optimization headroom

Step 5: Capture your wedge KPI baseline

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenOps

Why TokenOps

Architecture (high level)

Repository layout

Getting started

5-minute operator golden path

Step 1: Start the daemon

Step 2: Point one SDK request at the proxy

Step 3: Validate attribution and spend visibility

Step 4: Replay and inspect optimization headroom

Step 5: Capture your wedge KPI baseline

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages