A Unified Framework for Non-Autoregressive Language Models
XLM is a modular framework for developing and comparing non-autoregressive language models. It is built on PyTorch Lightning and Hydra.
- Composition over inheritance — core components (Harness, DataModule) delegate model-specific logic to swappable instances (Model, Loss, Predictor, Collator).
- Copy over branching — each model family is a self-contained package;
xlm-scaffoldcopies a working template instead of branching shared code. - Arbitrary code injection — Hydra resolves any importable Python callable by dotted path. Your preprocessing, loss, predictor, collator, and metrics can live in any package — XLM wires them at runtime via YAML.
Models: ARLM · MLM · ILM · MDLM · FlexMDM · Dream
Datasets: LM1B · OpenWebText · UniRef50 · QM9 · SAFE · Sudoku · Graph coloring · N-queens · Star graphs · …
Metrics: Loss · Exact match · Token accuracy · Generative perplexity · Parsability · Seq2seq EM · …
Model families (papers and docs)
The companion package xlm-models registers six families (see xlm_models.json). Cross-family comparison: Models overview.
| Tag | Name | Docs | State | Paper / notes |
|---|---|---|---|---|
arlm |
Autoregressive LM (baseline) | Guide | Beta | — |
ilm |
Insertion language model | Guide | Beta | arXiv:2505.05755 |
mdlm |
Masked diffusion LM | Guide | Beta | arXiv:2406.07524 |
mlm |
Masked language model (BERT-style) | Guide | Beta | — |
flexmdm |
Flexible masked diffusion | Guide | Alpha | arXiv:2509.01025 |
dream |
Dream-style decoder LM | Partial | Alpha | Source; backbone in xlm.backbones.dream |
ILM on LM1B — prepare data, train, evaluate, generate, and push to the Hub:
pip install xlm-core xlm-modelsxlm job_type=prepare_data job_name=lm1b_prepare experiment=lm1b_ilm
xlm job_type=train job_name=lm1b_ilm experiment=lm1b_ilm
xlm job_type=eval job_name=lm1b_ilm experiment=lm1b_ilm +eval.ckpt_path=<CHECKPOINT_PATH>
xlm job_type=generate job_name=lm1b_ilm experiment=lm1b_ilm +generation.ckpt_path=<CHECKPOINT_PATH>
xlm job_type=push_to_hub job_name=lm1b_ilm_hub experiment=lm1b_ilm +hub_checkpoint_path=<CHECKPOINT_PATH> +hub.repo_id=<YOUR_REPO_ID>For a debug run, add debug=overfit to the train command. Full walkthrough: Quick Start.
A new architecture implements four components: Model, Loss, Predictor, and Collator. Each set is self-contained for one LM family.
xlm-scaffold my_modelWire a dataset by pointing Hydra at any importable preprocess function (e.g. my_package.my_task.preprocess_fn) and adding dataset + datamodule YAMLs. For tasks shipped with xlm-core, use src/xlm/tasks/<task>/.
→ Adding a task or dataset · Your model on your task
| Contributor | Model | Paper |
|---|---|---|
| Dhruvesh Patel | DILM | A Continuous Time Markov Chain Framework for Insertion Language Models |
| Benjamin Rozonoyer, Jacopo Minniti | Relay | Learned Relay Representations for Forward-Thinking Discrete Diffusion Models |
| Dhruvesh Patel, Benjamin Rozonoyer | LoFlexMDM | Insertion Based Sequence Generation with Learnable Order Dynamics |
We welcome contributions. See CONTRIBUTING.md and the Good First Issue list.
@article{patel2025xlm,
title={XLM: A Python package for non-autoregressive language models},
author={Patel, Dhruvesh and Maram, Durga Prasad and Chintha, Sai Sreenivas and Rozonoyer, Benjamin and McCallum, Andrew},
journal={arXiv preprint arXiv:2512.17065},
year={2025}
}MIT · Built at IESL, UMass Amherst.

