Associative Recurrent Memory Transformer implementation compatible with Hugging Face models

ARMT is a memory-augmented segment-level recurrent Transformer. It scales up to 50M tokens being trained only on 16k. It enhances the original RMT with capacious and flexible associative memory and achieves state-of-the-art scores on BABILong benchmark.

paper code Associative Recurrent Memory Transformer

paper Scaling Transformer to 1M tokens and beyond with RMT

paper Recurrent Memory Transformer

We implement our memory mechanism with no changes to Transformer model by adding special memory tokens and linear-attention style associative memory. The model is trained to control both memory operations and sequence representations processing.

Installation

pip install -e .

This command will install lm_experiments_tools with only required packages for Trainer and tools.

lm_experiments_tools Trainer supports gradient accumulation, logging to tensorboard, saving the best models based on metrics, custom metrics and data transformations support.

Install requirements for all experiments

Full requirements for all experiments are specified in requirements.txt. Install requirements after cloning the repo:

pip install -r requirements.txt

To run langudge modelling with ARMT with sliding window:

cd scripts/pg19
bash finetune_armt_llama3.2_pg19_sliding.sh

Citation

If you find our work useful, please cite the RMT and ARMT papers:

@inproceedings{
        bulatov2022recurrent,
        title={Recurrent Memory Transformer},
        author={Aydar Bulatov and Yuri Kuratov and Mikhail Burtsev},
        booktitle={Advances in Neural Information Processing Systems},
        editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
        year={2022},
        url={https://openreview.net/forum?id=Uynr3iPhksa}
}

@misc{bulatov2023scaling,
      title={Scaling Transformer to 1M tokens and beyond with RMT}, 
      author={Aydar Bulatov and Yuri Kuratov and Mikhail S. Burtsev},
      year={2023},
      eprint={2304.11062},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kuratov2024search,
      title={In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss}, 
      author={Yuri Kuratov and Aydar Bulatov and Petr Anokhin and Dmitry Sorokin and Artyom Sorokin and Mikhail Burtsev},
      year={2024},
      eprint={2402.10790},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{rodkin2024associativerecurrentmemorytransformer,
      title={Associative Recurrent Memory Transformer}, 
      author={Ivan Rodkin and Yuri Kuratov and Aydar Bulatov and Mikhail Burtsev},
      year={2024},
      eprint={2407.04841},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.04841}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 443 Commits
.github/workflows		.github/workflows
accel_configs		accel_configs
base_models		base_models
baselines		baselines
downstream_tasks		downstream_tasks
img		img
lm_experiments_tools		lm_experiments_tools
megatron		megatron
modeling_amt		modeling_amt
modeling_rmt		modeling_rmt
notebooks		notebooks
rmt_utils		rmt_utils
scripts		scripts
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_code.md		README_code.md
accelerate.yaml		accelerate.yaml
adapters.py		adapters.py
babilong_utils.py		babilong_utils.py
data_utils.py		data_utils.py
parse_tb.py		parse_tb.py
requirements-lm-tools.txt		requirements-lm-tools.txt
requirements.txt		requirements.txt
run_finetuning_arxiv_rmt.py		run_finetuning_arxiv_rmt.py
run_finetuning_associative_retrieval.py		run_finetuning_associative_retrieval.py
run_finetuning_babilong_rmt.py		run_finetuning_babilong_rmt.py
run_finetuning_cell_autom.py		run_finetuning_cell_autom.py
run_finetuning_gpt_neox.py		run_finetuning_gpt_neox.py
run_finetuning_lm_rmt.py		run_finetuning_lm_rmt.py
run_finetuning_lm_rmt_distil.py		run_finetuning_lm_rmt_distil.py
run_finetuning_lm_rmt_hf.py		run_finetuning_lm_rmt_hf.py
run_finetuning_lm_rmt_hf_armt.py		run_finetuning_lm_rmt_hf_armt.py
run_finetuning_scrolls_rmt_decoder.py		run_finetuning_scrolls_rmt_decoder.py
setup.py		setup.py
test_act.ipynb		test_act.ipynb
test_babi.ipynb		test_babi.ipynb
test_lm.ipynb		test_lm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Associative Recurrent Memory Transformer implementation compatible with Hugging Face models

Installation

Install requirements for all experiments

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Associative Recurrent Memory Transformer implementation compatible with Hugging Face models

Installation

Install requirements for all experiments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages