Logit lens by Raphael-Bernas · Pull Request #88 · FOR-sight-ai/interpreto

Raphael-Bernas · 2025-09-10T21:38:25Z

Description

Implemented Logit Lens

Related Issue

Type of Change

🚀 New feature (non-breaking change which adds functionality)

Checklist

I've read the CODE_OF_CONDUCT.md document.
I've read the CONTRIBUTING.md guide.
I've successfully run the style checks using make lint.
I've written tests for all new methods and classes that I created and successfully ran make test.
I've written the docstring in Google format for all the methods and classes that I used.

…lens

… into logit-lens

…o logit-lens

AntoninPoche · 2025-09-12T16:57:17Z

@codex, please give me your opinion on this PR.

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-12T17:06:24Z

interpreto/others/logit_lens_general.py

+    # Mapping of model types to their corresponding LogitLens implementations
+    _model_type_to_lens = {
+        # Language Models
+        modeling_auto.AutoModelForCausalLM: LanguageModelLogitLens,
+        modeling_auto.AutoModelForMaskedLM: LanguageModelLogitLens,
+        # Classification Models
+        modeling_auto.AutoModelForSequenceClassification: ClassificationLogitLens,
+        modeling_auto.AutoModelForTokenClassification: ClassificationLogitLens,
+    }


[P1] Avoid routing token classification models to sequence logit lens

The factory currently maps AutoModelForTokenClassification to ClassificationLogitLens, but _process_logits in that class rejects any logits tensor that is not 2‑D (batch, num_classes). Token classification heads emit (batch, seq_len, num_labels) logits, so creating a logit lens for a token classifier will always raise ValueError before returning any results. Either add token-level handling in _process_logits or remove the token-classification mapping from the factory to avoid advertising unsupported models.

Useful? React with 👍 / 👎.

AntoninPoche · 2025-09-12T16:25:42Z

FETCH_HEAD

Please remove this file or tell me its use.

AntoninPoche · 2025-09-12T16:30:53Z

docs/api/others/methods/logit_lens.md

+## Main Classes
+
+::: interpreto.others.logit_lens_general.LogitLens
+    handler: python
+    options:
+      show_root_heading: true
+      show_source: true
+
+## Base Implementation
+
+::: interpreto.others.logit_lens_general.BaseLogitLens
+    handler: python
+    options:
+      show_root_heading: true
+      show_source: true
+      inherited_members: true
+      members:
+        - do_lens
+        - explain
+        - __call__


Do the base classes add any information compared to the classification and generation one for the user?

For me, you can leave only the classes used by the user.

AntoninPoche · 2025-09-12T16:32:33Z

docs/api/others/overview.md

The user wants all the information to be at the same place. Please merge this file into the LogitLens classes docstrings.

AntoninPoche · 2025-09-12T16:36:36Z

docs/notebooks/other_methods_walkthrough.ipynb

Please rename this file logit_lens_example.ipynb.

It should be compiled so users can see the result and what it looks like before trying it themselves.

I tried running it, but it did not compile. Similarly to the tests, there is a problem with model_autoclass -> automodel.

I did not analyze the notebook as I do not have the outputs.

AntoninPoche · 2025-09-12T16:37:37Z

interpreto/others/logit_lens_general.py

You can simply call this file logit_lens.py.

AntoninPoche · 2025-09-12T17:43:28Z

interpreto/others/logit_lens_general.py

+        if isinstance(layers_name, str):
+            layers_name = [layers_name]
+
+        self.layer_names = layers_name


This can surely be included in the self._validate_activations method.

AntoninPoche · 2025-09-12T18:18:46Z

interpreto/others/logit_lens_general.py

+    def _safe_item(self, tensor):
+        """Safely extract item from tensor, handling meta tensors."""
+        if tensor.device.type == "meta":
+            raise RuntimeError("Cannot extract item from meta tensor. Please ensure your model is properly loaded.")
+        return tensor.detach().cpu().item()
+
+    def _safe_tensor_to_cpu(self, tensor):
+        """Safely move tensor to CPU, handling meta tensors."""
+        if tensor.device.type == "meta":
+            raise RuntimeError("Cannot move meta tensor to CPU. Please ensure your model is properly loaded.")
+        return tensor.detach().cpu()


Might not be necessary anymore if we solve the meta tensor problem.

AntoninPoche · 2025-09-12T18:21:02Z

interpreto/others/logit_lens_general.py

+            raise ValueError("Unsupported type for inputs")
+
+        if layers_name is None:
+            print("Producing lens prediction for all activations.")


Remove prints in all your files.

AntoninPoche · 2025-09-12T18:22:53Z

interpreto/others/logit_lens_general.py

+        if isinstance(inputs, (str, list)):
+            if self.tokenizer.pad_token is None or self.tokenizer.pad_token_id < 0:
+                print("Tokenizer does not have a padding token. Setting a default padding token.")
+                self.tokenizer.pad_token = self.tokenizer.eos_token if self.tokenizer.eos_token else "[PAD]"
+
+            if isinstance(inputs, str):
+                inputs = [inputs]
+            batched_inputs = [inputs[i : i + self.batch_size] for i in range(0, len(inputs), self.batch_size)]
+        elif isinstance(inputs, BatchEncoding):
+            input_ids = inputs["input_ids"]
+            attention_mask = inputs["attention_mask"]
+            batched_inputs = [
+                BatchEncoding(
+                    {
+                        "input_ids": input_ids[i : i + self.batch_size],
+                        "attention_mask": attention_mask[i : i + self.batch_size],
+                    }
+                )
+                for i in range(0, input_ids.shape[0], self.batch_size)
+            ]
+        elif isinstance(inputs, torch.Tensor):
+            batched_inputs = torch.split(inputs, self.batch_size, dim=0)


Why not let ModelWithSplitPoints.get_activations() handle the batching?

AntoninPoche · 2025-09-12T18:26:36Z

interpreto/others/logit_lens_general.py

+                if layer not in merged_results:
+                    merged_results[layer] = data
+                else:
+                    # Merge batch results
+                    self._merge_batch_results(merged_results[layer], data)


Be careful, here, for each batch, you concatenate and create a new tensor. This is not good practice for memory usage and computation time.

You should make a list of data for a given layer, then concatenate them all.

AntoninPoche

Hi @Raphael-Bernas, I copied my previous review comment as most of my remarks are still relevant.

You will see the comments, but I basically told you to rework a bit of everything. If you think my comments are harsh, please do not take it to heart, I am focused on the code while reviewing. There are things to improve but you did a great job.

Note that everything that I say can be false; I insist, do not hesitate to challenge my comments. I do not have the complete picture of your code as you do. Sometimes I understand things later.

I suggested too many changes, so you can work on the same branch, I will review the whole code again, do not worry.

Here are a few first general comments:

You tried to treat too many cases with your first version. You iterated on your code without it being reviewed. You might have had fewer things to modify if you had interacted more with us. (But it is also our fault).
Several of your choices are not trivial (e.g., a factory or reloading the model). It is better to discuss architectural decisions like these with others before committing to the idea. Because they impact the final API, it is a team choice. Furthermore, we might have elements to help you as we have already developed other parts.
You must pass the make lint and make test before asking for reviews. The test on the PR's GitHub page should also pass before asking for a review, but you can ask for help.
You need to install the Pyright package in your editor and set the checks to "standard." This checks that the types you specified are coherent, preventing many uncaught bugs. You should not have red underlines from the type checker; otherwise, there is something to modify. You can say # type: ignore, but this should not be the default behavior.
Do not leave print in your code, not your tests. You can add logs if you really need to.
We need to discuss the meta tensor problem. We might be able to prevent it, but we should not load the model in the LogitLens init.
Several points can be managed by ModelWithSplitPoints.
The code lacks many comments and docstrings. These are super important, first for the users in the documentation, then for reviewers, and finally, for maintainers.

More precise comments now:

You can simplify your doc by merging the overview file into the classes' docstrings.
You wrote many functions to automatically adapt the code's behavior to the module names. There might be as many possibilities as models; hence, you cannot treat them all. It is not a problem to force the user to provide some parameters and guide them through documentation and error raising.
I think the documentation should be in a different file, but we can discuss.
After removing the automation methods and visualization, I think that most of the code can be factored into the base class. Hence, there might be a possibility that we only need one class. Which would be much more suitable for documentation and users.

I know it is a lot of remarks. It would have been easier if you had submitted something smaller or discussed more. I am sorry I did not catch it earlier.

AntoninPoche · 2025-09-13T12:37:15Z

interpreto/others/logit_lens_general.py

+class LanguageModelLogitLens(BaseLogitLens):
+    """
+    Logit Lens implementation for Language Models (Causal LM and Masked LM).
+
+    This implementation shows what tokens the model "thinks" should come next
+    or fill a mask at each layer.
+    """
+
+    def __init__(self, model: ModelWithSplitPoints, tokenizer: PreTrainedTokenizer, nb_token: int = 5, **kwargs):
+        self.vocab_size = tokenizer.vocab_size if tokenizer else None
+        self.nb_token = min(nb_token, self.vocab_size) if self.vocab_size else nb_token
+        super().__init__(model, tokenizer, **kwargs)


There should be a description of each attribute and parameter.

What is nb_token used for? How does it differ from vocab_size?

AntoninPoche · 2025-09-13T12:41:20Z

interpreto/others/logit_lens_general.py

+    Logit Lens implementation for Language Models (Causal LM and Masked LM).
+
+    This implementation shows what tokens the model "thinks" should come next
+    or fill a mask at each layer.


The description should be much more precise and concrete. Users might not know what a logit lens is.

Furthermore, I would add a warning in the docstring that the activation distribution shifts with layers. Hence, the predicted tokens might not make sense in the early layer due to this distribution shift.

Ultimately, I suggest copying most of the docstring between the classes the user might use for the logit lens.

It would be best if the user only uses one class.

AntoninPoche · 2025-09-13T12:47:56Z

interpreto/others/logit_lens_general.py

+    @abstractmethod
+    def do_lens(self, activation: torch.Tensor, inputs_model=None) -> torch.Tensor:
+        """
+        Apply the Logit Lens method to the model activations.
+
+        Args:
+            activation: Model activations tensor
+            inputs_model: Optional model inputs for accessing attention mask
+
+        Returns:
+            Processed logits tensor
+        """
+        pass


It seems that this method should not be used by the user, hence, it should be protected.

Furthermore, the name is not precise at all. I suggest calling it _predict_logits_from_activations.

AntoninPoche · 2025-09-13T12:52:05Z

interpreto/others/logit_lens_general.py

+            # Get predicted classes
+            predicted_classes = torch.argmax(proba, dim=-1)
+
+            # Get top-k predictions
+            top_k_values, top_k_indices = torch.topk(proba, k=min(self.num_classes, 5), dim=-1)


The argmax is included in the topk. You do the topk line and write: predicted_classes = top_k_indices[:, 0].

AntoninPoche · 2025-09-13T12:58:55Z

interpreto/others/logit_lens_general.py

+        results = {}
+        for layer, logits in logits_dict.items():
+            proba = F.softmax(logits, dim=-1)
+            top_k_indices = torch.topk(proba, self.nb_token, dim=-1).indices
+
+            results[layer] = {
+                "tokens": np.array(
+                    [
+                        [
+                            [
+                                self.tokenizer.decode([self._safe_item(index)])
+                                for index in top_k_indices[batch_idx, seq_idx]
+                            ]
+                            for seq_idx in range(top_k_indices.shape[1])
+                        ]
+                        for batch_idx in range(top_k_indices.shape[0])
+                    ]
+                ),
+                "proba": np.array(
+                    [
+                        [
+                            proba[batch_idx, seq_idx, top_k_indices[batch_idx, seq_idx]].detach().cpu().numpy()
+                            for seq_idx in range(top_k_indices.shape[1])
+                        ]
+                        for batch_idx in range(top_k_indices.shape[0])
+                    ]
+                ),
+            }
+        return results


This seemed really unoptimal to me. I asked our friend ChatGPT, and it gave me the following, which seems much better:

def process_logits(logits_dict, tokenizer, nb_token: int): results = {} for layer, logits in logits_dict.items(): # Compute probabilities proba = F.softmax(logits, dim=-1) # Get top-k indices and values directly (avoids recomputing probabilities later) top_k_values, top_k_indices = torch.topk(proba, nb_token, dim=-1) # Flatten indices for fast batch decoding flat_indices = top_k_indices.reshape(-1).tolist() # Batch decode once instead of looping per token decoded_tokens = tokenizer.convert_ids_to_tokens(flat_indices) # Reshape to original [batch, seq, k] decoded_tokens = np.array(decoded_tokens, dtype=object).reshape( top_k_indices.shape ) # Move proba to numpy (already aligned with indices) top_k_proba = top_k_values.detach().cpu().numpy() results[layer] = { "tokens": decoded_tokens, "proba": top_k_proba, } return results

I said in a previous comment that this function should treat only one layer at a time.

Therefore, the output does not have to be a dict of dict. But it can be a tuple of tensors (tokens, proba). This would be much better for type checking and clarity.

It is better to refrain from using dict for outputs when you can.

AntoninPoche · 2025-09-13T15:10:20Z

mkdocs.yml

+    - Other Methods:
+      - Overview: api/others/overview.md
+      - Methods:
+        - Logit Lens: api/others/methods/logit_lens.md


After looking at the documentation, I think people will find it easier if you just put Logit Lens at the same level as attributions and concepts (in the documentation).

Then, you can merge the overview file into the logit lens classes docstrings. There should be a single documentation file left.

Please refer to the ModelWithSplitPoints documentation in your classes documentation. (I do not remember how, maybe :class:~ModelWithSplitPoints).

AntoninPoche · 2025-09-13T15:13:18Z

tests/others/test_logit_lens.py

+CLASSIFICATION_MODELS = {
+    "hf-internal-testing/tiny-random-bert": {
+        "model_class": AutoModelForSequenceClassification,
+        "split_points": ["bert.encoder.layer.1.output"],
+        "head_name": "classifier",
+        "pooling_strategy": "cls",
+    },
+}


hf-internal-testing does not have initialized classification heads. That may explain your meta tensor error.

AntoninPoche · 2025-09-13T15:13:41Z

tests/others/test_logit_lens.py

+CAUSAL_LM_MODELS = {
+    "hf-internal-testing/tiny-random-gpt2": {
+        "model_class": AutoModelForCausalLM,
+        "split_points": ["transformer.h.1.mlp"],
+        "head_name": "lm_head",
+    },
+    "hf-internal-testing/tiny-random-gpt_neo": {
+        "model_class": AutoModelForCausalLM,
+        "split_points": ["transformer.h.1.mlp"],
+        "head_name": "lm_head",
+    },
+}
+
+MASKED_LM_MODELS = {
+    "hf-internal-testing/tiny-random-bert": {
+        "model_class": AutoModelForMaskedLM,
+        "split_points": ["bert.encoder.layer.1.output"],
+        "head_name": "cls.predictions.decoder",
+    },
+    "hf-internal-testing/tiny-random-roberta": {
+        "model_class": AutoModelForMaskedLM,
+        "split_points": ["roberta.encoder.layer.1.output"],
+        "head_name": "lm_head",
+    },
+}
+
+CLASSIFICATION_MODELS = {
+    "hf-internal-testing/tiny-random-bert": {
+        "model_class": AutoModelForSequenceClassification,
+        "split_points": ["bert.encoder.layer.1.output"],
+        "head_name": "classifier",
+        "pooling_strategy": "cls",
+    },
+}
+
+CI_MODELS = {
+    "causal_lm": ["hf-internal-testing/tiny-random-gpt2"],
+    "masked_lm": ["hf-internal-testing/tiny-random-bert"],
+    "classification": ["hf-internal-testing/tiny-random-bert"],
+}


This is not enough model diversity.

AntoninPoche · 2025-09-13T15:14:30Z

tests/others/test_logit_lens.py

+# Test fixtures
+@pytest.fixture
+def sentences():
+    """Test sentences for experiments."""
+    return [
+        "Interpreto is the latin for 'to interpret'. But it also sounds like a spell from the Harry Potter books.",
+        "Interpreto is magical",
+        "Testing interpreto",
+    ]


You do not have to define it, it already exist in tests/conftest.py, therefore it is automatically detected as a fixture.

AntoninPoche · 2025-09-13T15:18:39Z

tests/others/test_logit_lens.py

Most of your tests concern methods that automatically set parameters from module names. I suggested you remove these parts, so you can remove the tests too.

You tried several models, but it is far from enough due to the diversity of how models manage their heads.

I assume that with more models, the methods above would break.

I suggested too many modifications for the tests to remain relevant next time.

I will review the whole code again next time.

RaphaelBernas and others added 15 commits July 7, 2025 16:35

method logit lens

45f915a

Merge branch 'main' of github.com:FOR-sight-ai/interpreto into logit-…

4019ed2

…lens

model with split point: correct batch encoding input batching

e775a74

update batching for logit lens

406cdf4

add init

e17d73e

logit lens completed & testing

db557b8

Delete uv.lock

ea7c05f

PR-ready Logit Lens

6e87d5f

Merge branch 'logit-lens' of https://github.com/FOR-sight-ai/interpreto…

e5571f3

… into logit-lens

test corrected

be3c6f1

PR-test passed

0386dbd

correction test visualize_logit_lens_interactive

4c511b5

Update .gitignore with local modifications

3b5e179

Merge branch 'main' of https://github.com/FOR-sight-ai/interpreto int…

2fb8fb4

…o logit-lens

completion

cc40601

chatgpt-codex-connector bot reviewed Sep 12, 2025

View reviewed changes

AntoninPoche requested changes Sep 12, 2025

View reviewed changes

AntoninPoche requested changes Sep 13, 2025

View reviewed changes

AntoninPoche marked this pull request as draft January 20, 2026 08:44

Conversation

Raphael-Bernas commented Sep 10, 2025

Description

Related Issue

Type of Change

Checklist

Uh oh!

AntoninPoche commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntoninPoche Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntoninPoche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AntoninPoche commented Sep 12, 2025 •

edited

Loading

AntoninPoche Sep 12, 2025 •

edited

Loading