NormXLogit

NormXLogit: The Head-on-Top Never Lies

Abstract

With new large language models (LLMs) emerging frequently, it is important to consider the potential value of model-agnostic approaches that can provide interpretability across a variety of architectures. While recent advances in LLM interpretability show promise, many rely on complex, model-specific methods with high computational costs. To address these limitations, we propose NormXLogit, a novel technique for assessing the significance of individual input tokens. This method operates based on the input and output representations associated with each token. First, we demonstrate that the norm of word embeddings can be utilized as a measure of token importance. Second, we reveal a significant relationship between a token’s importance and how predictive its representation is of the model’s final output. Extensive analyses indicate that our approach outperforms existing gradient-based methods in terms of faithfulness and offers competitive performance compared to leading architecture-specific techniques.

Citation

If you found this work useful, please consider citing our paper:

@inproceedings{abbasi-etal-2025-normxlogit,
    title = "{N}orm{XL}ogit: The Head-on-Top Never Lies",
    author = "Abbasi, Sina  and
      Modarres, Mohammad Reza  and
      Pilehvar, Mohammad Taher",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1769/",
    pages = "34914--34935",
    ISBN = "979-8-89176-332-6",
    abstract = "With new large language models (LLMs) emerging frequently, it is important to consider the potential value of model-agnostic approaches that can provide interpretability across a variety of architectures. While recent advances in LLM interpretability show promise, many rely on complex, model-specific methods with high computational costs. To address these limitations, we propose NormXLogit, a novel technique for assessing the significance of individual input tokens. This method operates based on the input and output representations associated with each token. First, we demonstrate that the norm of word embeddings can be utilized as a measure of token importance. Second, we reveal a significant relationship between a token{'}s importance and how predictive its representation is of the model{'}s final output. Extensive analyses indicate that our approach outperforms existing gradient-based methods in terms of faithfulness and offers competitive performance compared to leading architecture-specific techniques."
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Experiment 1 - Faithfulness Analysis		Experiment 1 - Faithfulness Analysis
Experiment 2 - Evidence Alignment		Experiment 2 - Evidence Alignment
EMNLP2025_main-logo.png		EMNLP2025_main-logo.png
LICENSE		LICENSE
NormXLogit_2025_Poster.pdf		NormXLogit_2025_Poster.pdf
NormXLogit_2025_Slides.pdf		NormXLogit_2025_Slides.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NormXLogit

NormXLogit: The Head-on-Top Never Lies

Abstract

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NormXLogit

NormXLogit: The Head-on-Top Never Lies

Abstract

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages