GitHub - carbon-os/tensor-database

A modern AI database built for applications that need fast relational queries, full-text search, semantic vector search, and native reasoning over raw text — all in one engine, with no external dependencies.

Tensor Database is one of the first databases to treat reasoning as a first-class query primitive, alongside filtering, aggregation, and semantic search.

What It Is

Tensor Database is a purpose-built data engine that treats vectors, text, relational data, and reasoning as first-class citizens. Instead of bolting AI capabilities onto a legacy SQL engine, Tensor Database was designed from the ground up around a single pipeline model that handles all four query types natively and concurrently.

Queries are written in Tensor Query Language (.tql) — a strict, compile-time validated pipeline language that replaces SQL with a top-to-bottom execution model. No magic columns, no implicit coercions, no runtime surprises.

What Makes It Different

One query language, three storage engines. A single .tql pipeline can filter on a scalar field, match a keyword, and rank by semantic similarity in the same query. The storage layer splits these into three purpose-built modules — .sst for relational data, .rbm for full-text, .vec and .hnsw for vectors — and merges the results using Reciprocal Rank Fusion.

Reasoning is built in. Store raw text in any text field and query it with natural language using prompt(). The engine runs a nano+aggregator inference pipeline internally — tokenizing, splitting, fanning out across parallel worker models, and synthesizing the results into fully typed .tql fields that flow into the rest of the pipeline like any other variable.

Embeddings are built in. Tensor Database ships with a native embedding engine. Call embed() directly in any .tql pipeline — no external model server, no configuration, no separate process. The engine loads and runs open source embedding models inline.

Compile-time correctness. Types are defined once in .tql package files and enforced at every layer — schema, query, and mutation. If a pipeline is wrong, it fails before touching data, not against live rows.

Unix-native. Local access uses Unix Domain Socket IPC with OS-level file permission auth. Remote access uses QUIC with API key bearer tokens bound to path-level IAM roles. The same socket that serves queries also exposes the embedding and reasoning engines directly to agents and external processes.

Query Language at a Glance

import "shared/commerce"

from "store/products" as p: commerce.Product
where p.stock > 0 and p.price < 50.00
order by p.embedding <-> embed("gift ideas under fifty dollars")
limit 10

import "shared/commerce"

from "store/orders" as o: commerce.Order
join "store/customers" as c: commerce.Customer on o.customer_id == c.id

where o.status == "completed"
group by c.id, c.name

let total_spent = sum(o.total)
let order_count = count(*)

having total_spent > 1000.00

select {
    customer:      c.name,
    email:         c.email,
    total_spent,
    order_count,
    average_order: total_spent / order_count
}
order by total_spent desc
limit 25

import "shared/commerce"

from "store/products" as p: commerce.Product
where p.stock > 0
prompt("gift ideas under fifty dollars", tensor.REASONING) from p.raw_data as result: commerce.ProductResult

select {
    name:   result.name,
    price:  result.price,
    reason: result.reason
}
order by result.price asc
limit 10

Embedding Engine

embed() is a first-class primitive in .tql. It maps to use-case constants that abstract away the underlying model — your pipelines stay stable as default models are upgraded.

embed("query")                      // silent default → tensor.SEARCH
embed("query", tensor.SEARCH)       // semantic retrieval, RAG
embed("query", tensor.CLASSIFY)     // clustering, tagging, categorization
embed("query", tensor.CODE)         // code search, symbol lookup

Constant	Default Model	Dimensions
`tensor.SEARCH`	`nomic-embed-text-v1.5`	768
`tensor.CLASSIFY`	`all-MiniLM-L6-v2`	384
`tensor.CODE`	`jina-embeddings-v2-base-code`	768

Reasoning Engine

prompt() is a first-class pipeline stage in .tql. It runs a nano+aggregator inference pipeline over any raw text field and returns a fully typed result bound to your declared output schema.

prompt("prompt", tensor.REASONING)   // full reading comprehension, multi-fact synthesis

The mode constant is validated at compile time. The underlying models are daemon configuration — your pipelines never reference model names directly and require no changes when defaults are upgraded.

Storage Architecture

Module	Format	Purpose
`.wal`	Write-ahead log	ACID durability, crash recovery
`.sst`	Sorted string table (LSM)	Relational scalar data
`.rbm`	Roaring bitmap index	Full-text BM25 keyword search
`.vec`	Raw binary float arrays	Vector storage, zero-copy mmap reads
`.hnsw`	Navigable small world graph	Approximate nearest-neighbor search
`.tok`	Pre-tokenized int32 arrays	Fast segment loading for reasoning

Getting Started

# First-time setup
tensor db start --setup

# Run a migration
tensor db run migrations/001_setup.tql

# Interactive shell
tensor db shell

# Check engine status
tensor db status

Documentation

File	Description
`specs/query.md`	Query pipeline, filtering, projection, semantic search
`specs/mutations.md`	Insert, update, delete, upsert, returning
`specs/joins.md`	Inner joins, left joins, multi-path queries
`specs/aggregations.md`	Group by, aggregate functions, having
`specs/packages.md`	Type definitions, schema, migrations
`specs/types.md`	Full type reference
`specs/ai_backend.md`	Embedding and reasoning engine, tensor.* constants, socket access
`specs/reasoning.md`	prompt() pipeline stage, nano+aggregator architecture, capacity planning
`specs/storage.md`	Physical storage modules and execution model
`specs/iam.md`	Authentication, API keys, path-based access control
`specs/cli.md`	CLI reference, REPL, snapshots, observability

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
databoard/frontend		databoard/frontend
docs		docs
inference		inference
parser		parser
specs		specs
storage		storage
tensor-llm		tensor-llm
tql		tql
.gitignore		.gitignore
BUILD.md		BUILD.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
LICENSE-COMMERCIAL		LICENSE-COMMERCIAL
NOTICE		NOTICE
README.md		README.md
header.md		header.md
tests.sh		tests.sh
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What It Is

What Makes It Different

Query Language at a Glance

Embedding Engine

Reasoning Engine

Storage Architecture

Getting Started

Documentation

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What It Is

What Makes It Different

Query Language at a Glance

Embedding Engine

Reasoning Engine

Storage Architecture

Getting Started

Documentation

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages