Atome LM

A reference implementation of a routed-ternary tiny language model with a bit-exact Python ↔ C99 inference engine, sized for microcontroller-class RAM budgets.

60K-default-parameter LM combining three known ideas into one open kit: ternary weights (after BitNet b1.58), a hybrid SSM + sparse-attention + local-conv block routed per token (after Hymba and MossNet), and a byte tokenizer at super-tiny scale (after Guertler 2024). The contribution is integration, not architecture: a complete train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C parity enforced by tests.

Quick links:

📄 Architecture writeup: PAPER.md
🔬 Honest results, including the 944 K reversal: HONEST_RESULTS.md
🌐 Live in-browser demo (no install): atomelm.com/demo.html
🏠 Project home: atomelm.com

Get the kit: training code, C engine, benchmarks, paper, and trained weights — all in this repository, released under the Apache 2.0 License. Train your own checkpoint with scripts/train_demo.py in ~30 min on a CPU, or run the bundled 944 K checkpoint immediately.

MCU status: QEMU ARM (Cortex-M3, MPS2-AN385) parity passes to FP32 epsilon. Silicon bring-up is not done in this repository. For real device deployment we sell integration — silicon bring-up, the Atome Secure Boot Pack (signed .atome blobs, dev/prod flags, per-platform secure-boot, attestation), per-platform hardening — at atomelm.com.

Weights are included in checkpoints/:

atome_944k.bin (271 KB) — the packed C-engine blob (ATOME01 format), loadable directly by the inference engine.
atome_1m_v1.pt (3.7 MB) — the PyTorch source checkpoint that produced it; use this to fine-tune or to re-export at different #defines.
vanilla_1m_v1.pt (3.7 MB) — the FP32 vanilla GPT baseline used for the 944K A/B reversal in HONEST_RESULTS.md; shipped so you can reproduce the comparison end-to-end.

The 944K checkpoint is a research-demo artifact, not a product: it is narrow, sometimes incoherent, and trained on a single corpus. It is here to make the architecture runnable, not to set a quality bar. Reproduction is ~$1–2 of CPU/GPU using the included training code; nothing in this kit is a reproduction barrier.

Reproducible result, narrow regime

On TinyStories, 3000 steps, single seed: at fixed parameter count Atome's routed-ternary block reaches 6.31 ppl vs 8.12 for a vanilla GPT-FP32 baseline (−22 %); at fixed flash budget 6.31 vs 13.10 (−52 %). Disk footprint is 16× smaller at param-match (15.1 KB vs 237.5 KB).

The result reverses at 944 K parameters, where the vanilla FP32 baseline wins by ~11 %. Atome's bet is deliberately the sub-1M, MCU-class regime; above it ternary's capacity ceiling closes the gap and overruns it. Full reproduction in FRONTIER.md, full honest reading including the reversal in HONEST_RESULTS.md.

Why

Datacenter LLMs assume datacenter RAM. A $2 microcontroller stuck on a wall in a remote sensor, a hearing aid, a battery-powered toy, or a thermostat doesn't have it. Atome LM is the model design end of that constraint:

Ternary weights ({-α, 0, +α} per tensor, BitNet b1.58 style). No float multiplies in the matmul at inference.
3-pathway block (local depthwise conv, diagonal SSM, top-k sparse attention) mixed by a per-token soft router. Designed to match the Atome C99 engine struct exactly so trained checkpoints export to flash and run with bit-exact parity between Python and C.
Byte tokenizer. No BPE table to ship.
Router entropy as a calibration signal. The per-token router distribution's entropy is observable for free at every position. In our larger V2 production model (3 M parameters, mixed-domain training) entropy is observed to track out-of-domain inputs and to correlate with per-token loss. At Atome-LLM's engine-default 60 K-parameter scale on a single narrow corpus the signal is exposed identically — but its calibration as an uncertainty estimator at that scale has not been measured here.

What this is and isn't

Is: the Python training side and architecture for a ternary LM that runs on cents-class hardware.
Isn't: a general-purpose chatbot. At engine-default config (d_model=64, n_layers=4) the model is roughly 60 K parameters and exports to about 20 KB of flash. Train it narrow — a single domain (embedded-system Q&A, command-line help, a single FAQ) — and it speaks fluently inside that scope. Going wide at this size produces incoherent output; that is a reflection of capacity, not of the architecture. For more headroom, raise d_model and n_layers (e.g. d_model=128, n_layers=6 ≈ 600 K parameters, ~150 KB packed) and re-export with the matching #defines.

Install

./install.sh          # CPU-only venv + dependencies + environment check

Or manually: pip install -e . (Python ≥ 3.10, PyTorch ≥ 2.0). New here? QUICKSTART.md is the 60-second path from clone to a microcontroller-ready model.

Quick start

import torch
from atome_llm.core.atome_lm import AtomeLM

# Defaults match the Atome C99 engine's compile-time #defines:
#   d_model=64, n_layers=4, d_head=16, top_k=4, kernel=5, vocab=256.
model = AtomeLM()
print(f"params: {model.parameter_count():,}")

ids = torch.randint(0, 256, (1, 32))
logits = model(ids)                     # (1, 32, 256)
loss = model.loss(ids[:, :-1], ids[:, 1:])

# Per-layer per-token uncertainty signal — no extra training:
ent_per_layer = model.router_entropies(ids)   # list of (B, L) tensors

Train a tiny demo

python scripts/train_demo.py --data path/to/text.txt --steps 1000 \
    --d-model 64 --n-layers 4 --output checkpoints/atome_demo.pt

A built-in build_corpus.py fetches a few permissively-licensed sources (tinystories, esp-idf, mcu-wikipedia) for smoke training:

python scripts/build_corpus.py --source tinystories --max-bytes 500000 \
    --output data/tinystories.txt

Try a checkpoint

python scripts/demo.py --checkpoint checkpoints/atome_demo.pt
# or, with no checkpoint, sanity-check the plumbing:
python scripts/demo.py --random --temperature 0.8 --top-p 0.9

The REPL prints the continuation and per-layer router-entropy bars over the prompt — the metacognition signal that's exposed for free.

Sampling

AtomeLM.generate(...) defaults to greedy argmax (matching the C engine's atome_predict_next). Optional temperature, top_p, top_k, and generator= arguments enable nucleus / top-k sampling with seeded reproducibility.

Benchmark

python scripts/benchmark.py            # tiny / default / large

CPU forward + generate latency at three representative configs. Useful as a regression check after architecture changes; not an MCU number.

Export to a microcontroller

python scripts/export_to_atome.py \
    --checkpoint checkpoints/atome_demo.pt \
    --output checkpoints/atome_demo.atome \
    --header

This produces a .atome flat binary you can #include from C and load with atome_load(...) from the Atome C99 engine. At default config the binary is well under 100 KB — fits comfortably on ESP32-S3, STM32F4, RP2040, nRF52840, ESP32-C3.

Architecture

x → LayerNorm → ┬─→ Local  (depthwise causal conv k=5)        ─→┐
                ├─→ State  (diagonal SSM, O(L))                 ─→ Σ → +x
                └─→ Sparse (top-k attention, O(L·k))            ─→┘
                        ↑              ↑
                        │              router weights r ∈ Δ per token
                        └──────────────┘

Three pathways. Three different inductive biases. One shared per-token router that learns which pathway is most appropriate for each position. The router's per-token entropy is exposed as a free per-position uncertainty signal at every layer.

The full architecture story is in PAPER.md.

Tests

pytest -q

License

Apache License 2.0 — see LICENSE and NOTICE.

The kit is fully open: use, modify, redistribute, and ship in commercial products without per-seat or per-device fees. The Apache 2.0 patent grant covers the 3-pathway routed-ternary architecture as released here.

The released checkpoints in checkpoints/ (atome_944k.bin, atome_1m_v1.pt, vanilla_1m_v1.pt) are likewise Apache-2.0. They are reference / research artifacts, not products. Commercial integration — silicon bring-up, the Atome Secure Boot Pack (signed .atome blobs, dev/prod flags, per-platform secure-boot, attestation), per-platform hardening, custom-domain fine-tuning of the larger internal V2 model — is available at atomelm.com.

Citation

@software{atome_llm_2026,
  title  = {Atome LM: a tiny ternary language model for microcontroller deployment},
  author = {Atome LM contributors},
  year   = {2026},
  note   = {Apache 2.0, https://atomelm.com},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
atome_llm		atome_llm
c_engine		c_engine
checkpoints		checkpoints
data		data
scripts		scripts
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FRONTIER.md		FRONTIER.md
HONEST_RESULTS.md		HONEST_RESULTS.md
LICENSE		LICENSE
NOTICE		NOTICE
PAPER.md		PAPER.md
PROJECT_CONTENT.md		PROJECT_CONTENT.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REPRODUCE.md		REPRODUCE.md
ab_results.json		ab_results.json
check_env.py		check_env.py
extended_results.json		extended_results.json
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atome LM

Reproducible result, narrow regime

Why

What this is and isn't

Install

Quick start

Train a tiny demo

Try a checkpoint

Sampling

Benchmark

Export to a microcontroller

Architecture

Tests

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Atome LM

Reproducible result, narrow regime

Why

What this is and isn't

Install

Quick start

Train a tiny demo

Try a checkpoint

Sampling

Benchmark

Export to a microcontroller

Architecture

Tests

License

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages