Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Codebase for Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Published in Findings of the Association for Computational Linguistics: NAACL 2025.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance on a wide range of natural language processing (NLP) tasks, primarily through in-context learning (ICL). In ICL, the LLM is provided with examples that represent a given task such that it learns to generate answers for test inputs. However, access to these in-context examples is not guaranteed especially for low-resource or massively multilingual tasks. In this work, we propose an unsupervised approach to mine in-context examples for machine translation (MT), enabling unsupervised MT (UMT) across different languages. Our approach begins with word-level mining to acquire word translations that are then used to perform sentence-level mining. As the quality of mined parallel pairs may not be optimal due to noise or mistakes, we introduce a filtering criterion to select the optimal in-context examples from a pool of unsupervised parallel sentences. We evaluate our approach using two multilingual LLMs on 288 directions from the FLORES-200 dataset (Team et al., 2022) and analyze the impact of various linguistic features on performance. Our findings demonstrate the effectiveness of our unsupervised approach in mining in-context examples for MT, leading to better or comparable translation performance as translation with regular in-context samples (extracted from human-annotated data), while also outperforming the other state-of-the-art UMT methods by an average of 7 BLEU points.

This repo restructures the research workspace into a reusable package plus plain Python scripts while preserving the method described in the paper:

mine high-confidence word translation pairs,
build weak word-by-word translations,
back-translate unlabeled target-language sentences,
select sentence-level ICL examples with TopK+BM25,
translate the test set with the mined examples.

Dependencies

pip install -r requirements.text

Paper resources

Models used in the paper:

Similarity model used in the paper:

sentence-transformers/stsb-xlm-r-multilingual

Main evaluation dataset:

FLORES+

Vocabularies:

FastText crawl vectors

We use frequency-sorted FastText vocabularies and keep the top 10,000 words per language.

Main scripts

Run the paper's unsupervised method on selected pairs:

python run_unsupervised.py \
  --pair eng_Latn,deu_Latn \
  --model-path /path/to/Meta-Llama-3-8B \
  --similarity-model-path /path/to/stsb-xlm-r-multilingual \
  --lexicons-dir /path/to/lexicons \
  --sentences-dir /path/to/flores_plus \
  --output-dir outputs/main

Run the regular BM25 ICL baseline:

python run_regular_bm25.py \
  --pair eng_Latn,deu_Latn \
  --model-path /path/to/Meta-Llama-3-8B \
  --similarity-model-path /path/to/stsb-xlm-r-multilingual \
  --lexicons-dir /path/to/lexicons \
  --sentences-dir /path/to/flores_plus \
  --output-dir outputs/baselines

Run the main paper setting with the built-in FLORES preset:

python run_main.py \
  --model-path /path/to/Meta-Llama-3-8B \
  --similarity-model-path /path/to/stsb-xlm-r-multilingual \
  --lexicons-dir /path/to/lexicons \
  --sentences-dir /path/to/flores_plus

Run the decoding ablation subset:

python run_decoding_ablation.py \
  --model-path /path/to/Meta-Llama-3-8B \
  --similarity-model-path /path/to/stsb-xlm-r-multilingual \
  --lexicons-dir /path/to/lexicons \
  --sentences-dir /path/to/flores_plus

Useful scripts in the repo root:

run_main.py: main FLORES setting with the paper pair preset.
run_unsupervised.py: generic unsupervised runner.
run_regular_bm25.py: generic regular BM25 baseline.
run_decoding_ablation.py: decoding strategy ablation subset.

Input format

The scripts expect plain text files with one item per line:

Lexicon vocabularies: one word per line, for example lexicons/en_frequent_words.txt.
FLORES/FLORES+ dev and test files: one sentence per line.

Expected --sentences-dir layout:

flores_plus/
  dev/
    dev.eng_Latn
    dev.fra_Latn
  devtest/
    devtest.eng_Latn
    devtest.fra_Latn

Expected --lexicons-dir layout:

lexicons/
  en_frequent_words.txt
  fr_frequent_words.txt

Concrete examples are in examples/README.md.

Citation

@inproceedings{el-mekki-abdul-mageed-2025-effective,
    title = "Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with {LLM}s",
    author = "El Mekki, Abdellah  and
      Abdul-Mageed, Muhammad",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.238/",
    doi = "10.18653/v1/2025.findings-naacl.238",
    pages = "4229--4256",
    ISBN = "979-8-89176-195-7",
    abstract = "Large Language Models (LLMs) have demonstrated impressive performance on a wide range of natural language processing (NLP) tasks, primarily through in-context learning (ICL). In ICL, the LLM is provided with examples that represent a given task such that it learns to generate answers for test inputs. However, access to these in-context examples is not guaranteed especially for low-resource or massively multilingual tasks. In this work, we propose an unsupervised approach to mine in-context examples for machine translation (MT), enabling unsupervised MT (UMT) across different languages. Our approach begins with word-level mining to acquire word translations that are then used to perform sentence-level mining. As the quality of mined parallel pairs may not be optimal due to noise or mistakes, we introduce a filtering criterion to select the optimal in-context examples from a pool of unsupervised parallel sentences. We evaluate our approach using two multilingual LLMs on 288 directions from the FLORES-200 dataset (CITATION) and analyze the impact of various linguistic features on performance. Our findings demonstrate the effectiveness of our unsupervised approach in mining in-context examples for MT, leading to better or comparable translation performance as translation with regular in-context samples (extracted from human-annotated data), while also outperforming the other state-of-the-art UMT methods by an average of 7 BLEU points."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Abstract

Dependencies

Paper resources

Main scripts

Input format

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
sm_umt		sm_umt
.gitignore		.gitignore
README.md		README.md
requirements.text		requirements.text
run_decoding_ablation.py		run_decoding_ablation.py
run_main.py		run_main.py
run_regular_bm25.py		run_regular_bm25.py
run_unsupervised.py		run_unsupervised.py

Folders and files

Latest commit

History

Repository files navigation

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

Abstract

Dependencies

Paper resources

Main scripts

Input format

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages