Skip to content

Logit lens#88

Draft
Raphael-Bernas wants to merge 15 commits intomainfrom
logit-lens
Draft

Logit lens#88
Raphael-Bernas wants to merge 15 commits intomainfrom
logit-lens

Conversation

@Raphael-Bernas
Copy link
Collaborator

Description

Implemented Logit Lens

Related Issue

Type of Change

  • 🚀 New feature (non-breaking change which adds functionality)

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've successfully run the style checks using make lint.
  • I've written tests for all new methods and classes that I created and successfully ran make test.
  • I've written the docstring in Google format for all the methods and classes that I used.

@AntoninPoche
Copy link
Collaborator

AntoninPoche commented Sep 12, 2025

@codex, please give me your opinion on this PR.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +1373 to +1381
# Mapping of model types to their corresponding LogitLens implementations
_model_type_to_lens = {
# Language Models
modeling_auto.AutoModelForCausalLM: LanguageModelLogitLens,
modeling_auto.AutoModelForMaskedLM: LanguageModelLogitLens,
# Classification Models
modeling_auto.AutoModelForSequenceClassification: ClassificationLogitLens,
modeling_auto.AutoModelForTokenClassification: ClassificationLogitLens,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Avoid routing token classification models to sequence logit lens

The factory currently maps AutoModelForTokenClassification to ClassificationLogitLens, but _process_logits in that class rejects any logits tensor that is not 2‑D (batch, num_classes). Token classification heads emit (batch, seq_len, num_labels) logits, so creating a logit lens for a token classifier will always raise ValueError before returning any results. Either add token-level handling in _process_logits or remove the token-classification mapping from the factory to avoid advertising unsupported models.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file or tell me its use.

Comment on lines +9 to +28
## Main Classes

::: interpreto.others.logit_lens_general.LogitLens
handler: python
options:
show_root_heading: true
show_source: true

## Base Implementation

::: interpreto.others.logit_lens_general.BaseLogitLens
handler: python
options:
show_root_heading: true
show_source: true
inherited_members: true
members:
- do_lens
- explain
- __call__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the base classes add any information compared to the classification and generation one for the user?

For me, you can leave only the classes used by the user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user wants all the information to be at the same place. Please merge this file into the LogitLens classes docstrings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this file logit_lens_example.ipynb.

It should be compiled so users can see the result and what it looks like before trying it themselves.

I tried running it, but it did not compile. Similarly to the tests, there is a problem with model_autoclass -> automodel.

I did not analyze the notebook as I do not have the outputs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply call this file logit_lens.py.

Comment on lines +311 to +314
if isinstance(layers_name, str):
layers_name = [layers_name]

self.layer_names = layers_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can surely be included in the self._validate_activations method.

Comment on lines +322 to +332
def _safe_item(self, tensor):
"""Safely extract item from tensor, handling meta tensors."""
if tensor.device.type == "meta":
raise RuntimeError("Cannot extract item from meta tensor. Please ensure your model is properly loaded.")
return tensor.detach().cpu().item()

def _safe_tensor_to_cpu(self, tensor):
"""Safely move tensor to CPU, handling meta tensors."""
if tensor.device.type == "meta":
raise RuntimeError("Cannot move meta tensor to CPU. Please ensure your model is properly loaded.")
return tensor.detach().cpu()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be necessary anymore if we solve the meta tensor problem.

raise ValueError("Unsupported type for inputs")

if layers_name is None:
print("Producing lens prediction for all activations.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove prints in all your files.

Comment on lines +363 to +384
if isinstance(inputs, (str, list)):
if self.tokenizer.pad_token is None or self.tokenizer.pad_token_id < 0:
print("Tokenizer does not have a padding token. Setting a default padding token.")
self.tokenizer.pad_token = self.tokenizer.eos_token if self.tokenizer.eos_token else "[PAD]"

if isinstance(inputs, str):
inputs = [inputs]
batched_inputs = [inputs[i : i + self.batch_size] for i in range(0, len(inputs), self.batch_size)]
elif isinstance(inputs, BatchEncoding):
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]
batched_inputs = [
BatchEncoding(
{
"input_ids": input_ids[i : i + self.batch_size],
"attention_mask": attention_mask[i : i + self.batch_size],
}
)
for i in range(0, input_ids.shape[0], self.batch_size)
]
elif isinstance(inputs, torch.Tensor):
batched_inputs = torch.split(inputs, self.batch_size, dim=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let ModelWithSplitPoints.get_activations() handle the batching?

Comment on lines +395 to +399
if layer not in merged_results:
merged_results[layer] = data
else:
# Merge batch results
self._merge_batch_results(merged_results[layer], data)
Copy link
Collaborator

@AntoninPoche AntoninPoche Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful, here, for each batch, you concatenate and create a new tensor. This is not good practice for memory usage and computation time.

You should make a list of data for a given layer, then concatenate them all.

Copy link
Collaborator

@AntoninPoche AntoninPoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Raphael-Bernas, I copied my previous review comment as most of my remarks are still relevant.

You will see the comments, but I basically told you to rework a bit of everything. If you think my comments are harsh, please do not take it to heart, I am focused on the code while reviewing. There are things to improve but you did a great job.

Note that everything that I say can be false; I insist, do not hesitate to challenge my comments. I do not have the complete picture of your code as you do. Sometimes I understand things later.

I suggested too many changes, so you can work on the same branch, I will review the whole code again, do not worry.

Here are a few first general comments:

  • You tried to treat too many cases with your first version. You iterated on your code without it being reviewed. You might have had fewer things to modify if you had interacted more with us. (But it is also our fault).
  • Several of your choices are not trivial (e.g., a factory or reloading the model). It is better to discuss architectural decisions like these with others before committing to the idea. Because they impact the final API, it is a team choice. Furthermore, we might have elements to help you as we have already developed other parts.
  • You must pass the make lint and make test before asking for reviews. The test on the PR's GitHub page should also pass before asking for a review, but you can ask for help.
  • You need to install the Pyright package in your editor and set the checks to "standard." This checks that the types you specified are coherent, preventing many uncaught bugs. You should not have red underlines from the type checker; otherwise, there is something to modify. You can say # type: ignore, but this should not be the default behavior.
  • Do not leave print in your code, not your tests. You can add logs if you really need to.
  • We need to discuss the meta tensor problem. We might be able to prevent it, but we should not load the model in the LogitLens init.
  • Several points can be managed by ModelWithSplitPoints.
  • The code lacks many comments and docstrings. These are super important, first for the users in the documentation, then for reviewers, and finally, for maintainers.

More precise comments now:

  • You can simplify your doc by merging the overview file into the classes' docstrings.
  • You wrote many functions to automatically adapt the code's behavior to the module names. There might be as many possibilities as models; hence, you cannot treat them all. It is not a problem to force the user to provide some parameters and guide them through documentation and error raising.
  • I think the documentation should be in a different file, but we can discuss.
  • After removing the automation methods and visualization, I think that most of the code can be factored into the base class. Hence, there might be a possibility that we only need one class. Which would be much more suitable for documentation and users.

I know it is a lot of remarks. It would have been easier if you had submitted something smaller or discussed more. I am sorry I did not catch it earlier.

Comment on lines +414 to +425
class LanguageModelLogitLens(BaseLogitLens):
"""
Logit Lens implementation for Language Models (Causal LM and Masked LM).

This implementation shows what tokens the model "thinks" should come next
or fill a mask at each layer.
"""

def __init__(self, model: ModelWithSplitPoints, tokenizer: PreTrainedTokenizer, nb_token: int = 5, **kwargs):
self.vocab_size = tokenizer.vocab_size if tokenizer else None
self.nb_token = min(nb_token, self.vocab_size) if self.vocab_size else nb_token
super().__init__(model, tokenizer, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a description of each attribute and parameter.

What is nb_token used for? How does it differ from vocab_size?

Comment on lines +416 to +419
Logit Lens implementation for Language Models (Causal LM and Masked LM).

This implementation shows what tokens the model "thinks" should come next
or fill a mask at each layer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description should be much more precise and concrete. Users might not know what a logit lens is.

Furthermore, I would add a warning in the docstring that the activation distribution shifts with layers. Hence, the predicted tokens might not make sense in the early layer due to this distribution shift.

Ultimately, I suggest copying most of the docstring between the classes the user might use for the logit lens.

It would be best if the user only uses one class.

Comment on lines +273 to +285
@abstractmethod
def do_lens(self, activation: torch.Tensor, inputs_model=None) -> torch.Tensor:
"""
Apply the Logit Lens method to the model activations.

Args:
activation: Model activations tensor
inputs_model: Optional model inputs for accessing attention mask

Returns:
Processed logits tensor
"""
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this method should not be used by the user, hence, it should be protected.

Furthermore, the name is not precise at all. I suggest calling it _predict_logits_from_activations.

Comment on lines +1098 to +1102
# Get predicted classes
predicted_classes = torch.argmax(proba, dim=-1)

# Get top-k predictions
top_k_values, top_k_indices = torch.topk(proba, k=min(self.num_classes, 5), dim=-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argmax is included in the topk. You do the topk line and write: predicted_classes = top_k_indices[:, 0].

Comment on lines +467 to +495
results = {}
for layer, logits in logits_dict.items():
proba = F.softmax(logits, dim=-1)
top_k_indices = torch.topk(proba, self.nb_token, dim=-1).indices

results[layer] = {
"tokens": np.array(
[
[
[
self.tokenizer.decode([self._safe_item(index)])
for index in top_k_indices[batch_idx, seq_idx]
]
for seq_idx in range(top_k_indices.shape[1])
]
for batch_idx in range(top_k_indices.shape[0])
]
),
"proba": np.array(
[
[
proba[batch_idx, seq_idx, top_k_indices[batch_idx, seq_idx]].detach().cpu().numpy()
for seq_idx in range(top_k_indices.shape[1])
]
for batch_idx in range(top_k_indices.shape[0])
]
),
}
return results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed really unoptimal to me. I asked our friend ChatGPT, and it gave me the following, which seems much better:

def process_logits(logits_dict, tokenizer, nb_token: int):
    results = {}

    for layer, logits in logits_dict.items():
        # Compute probabilities
        proba = F.softmax(logits, dim=-1)

        # Get top-k indices and values directly (avoids recomputing probabilities later)
        top_k_values, top_k_indices = torch.topk(proba, nb_token, dim=-1)

        # Flatten indices for fast batch decoding
        flat_indices = top_k_indices.reshape(-1).tolist()

        # Batch decode once instead of looping per token
        decoded_tokens = tokenizer.convert_ids_to_tokens(flat_indices)

        # Reshape to original [batch, seq, k]
        decoded_tokens = np.array(decoded_tokens, dtype=object).reshape(
            top_k_indices.shape
        )

        # Move proba to numpy (already aligned with indices)
        top_k_proba = top_k_values.detach().cpu().numpy()

        results[layer] = {
            "tokens": decoded_tokens,
            "proba": top_k_proba,
        }

    return results

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said in a previous comment that this function should treat only one layer at a time.

Therefore, the output does not have to be a dict of dict. But it can be a tuple of tensors (tokens, proba). This would be much better for type checking and clarity.

It is better to refrain from using dict for outputs when you can.

Comment on lines +47 to +50
- Other Methods:
- Overview: api/others/overview.md
- Methods:
- Logit Lens: api/others/methods/logit_lens.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the documentation, I think people will find it easier if you just put Logit Lens at the same level as attributions and concepts (in the documentation).

Then, you can merge the overview file into the logit lens classes docstrings. There should be a single documentation file left.

Please refer to the ModelWithSplitPoints documentation in your classes documentation. (I do not remember how, maybe :class:~ModelWithSplitPoints).

Comment on lines +86 to +93
CLASSIFICATION_MODELS = {
"hf-internal-testing/tiny-random-bert": {
"model_class": AutoModelForSequenceClassification,
"split_points": ["bert.encoder.layer.1.output"],
"head_name": "classifier",
"pooling_strategy": "cls",
},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf-internal-testing does not have initialized classification heads. That may explain your meta tensor error.

Comment on lines +60 to +99
CAUSAL_LM_MODELS = {
"hf-internal-testing/tiny-random-gpt2": {
"model_class": AutoModelForCausalLM,
"split_points": ["transformer.h.1.mlp"],
"head_name": "lm_head",
},
"hf-internal-testing/tiny-random-gpt_neo": {
"model_class": AutoModelForCausalLM,
"split_points": ["transformer.h.1.mlp"],
"head_name": "lm_head",
},
}

MASKED_LM_MODELS = {
"hf-internal-testing/tiny-random-bert": {
"model_class": AutoModelForMaskedLM,
"split_points": ["bert.encoder.layer.1.output"],
"head_name": "cls.predictions.decoder",
},
"hf-internal-testing/tiny-random-roberta": {
"model_class": AutoModelForMaskedLM,
"split_points": ["roberta.encoder.layer.1.output"],
"head_name": "lm_head",
},
}

CLASSIFICATION_MODELS = {
"hf-internal-testing/tiny-random-bert": {
"model_class": AutoModelForSequenceClassification,
"split_points": ["bert.encoder.layer.1.output"],
"head_name": "classifier",
"pooling_strategy": "cls",
},
}

CI_MODELS = {
"causal_lm": ["hf-internal-testing/tiny-random-gpt2"],
"masked_lm": ["hf-internal-testing/tiny-random-bert"],
"classification": ["hf-internal-testing/tiny-random-bert"],
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough model diversity.

Comment on lines +102 to +110
# Test fixtures
@pytest.fixture
def sentences():
"""Test sentences for experiments."""
return [
"Interpreto is the latin for 'to interpret'. But it also sounds like a spell from the Harry Potter books.",
"Interpreto is magical",
"Testing interpreto",
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not have to define it, it already exist in tests/conftest.py, therefore it is automatically detected as a fixture.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of your tests concern methods that automatically set parameters from module names. I suggested you remove these parts, so you can remove the tests too.

You tried several models, but it is far from enough due to the diversity of how models manage their heads.

I assume that with more models, the methods above would break.

I suggested too many modifications for the tests to remain relevant next time.

I will review the whole code again next time.

@AntoninPoche AntoninPoche marked this pull request as draft January 20, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants