Conversation
… into logit-lens
|
@codex, please give me your opinion on this PR. |
There was a problem hiding this comment.
Codex Review: Here are some suggestions.
Reply with @codex fix comments to fix any unresolved comments.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
| # Mapping of model types to their corresponding LogitLens implementations | ||
| _model_type_to_lens = { | ||
| # Language Models | ||
| modeling_auto.AutoModelForCausalLM: LanguageModelLogitLens, | ||
| modeling_auto.AutoModelForMaskedLM: LanguageModelLogitLens, | ||
| # Classification Models | ||
| modeling_auto.AutoModelForSequenceClassification: ClassificationLogitLens, | ||
| modeling_auto.AutoModelForTokenClassification: ClassificationLogitLens, | ||
| } |
There was a problem hiding this comment.
[P1] Avoid routing token classification models to sequence logit lens
The factory currently maps AutoModelForTokenClassification to ClassificationLogitLens, but _process_logits in that class rejects any logits tensor that is not 2‑D (batch, num_classes). Token classification heads emit (batch, seq_len, num_labels) logits, so creating a logit lens for a token classifier will always raise ValueError before returning any results. Either add token-level handling in _process_logits or remove the token-classification mapping from the factory to avoid advertising unsupported models.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Please remove this file or tell me its use.
| ## Main Classes | ||
|
|
||
| ::: interpreto.others.logit_lens_general.LogitLens | ||
| handler: python | ||
| options: | ||
| show_root_heading: true | ||
| show_source: true | ||
|
|
||
| ## Base Implementation | ||
|
|
||
| ::: interpreto.others.logit_lens_general.BaseLogitLens | ||
| handler: python | ||
| options: | ||
| show_root_heading: true | ||
| show_source: true | ||
| inherited_members: true | ||
| members: | ||
| - do_lens | ||
| - explain | ||
| - __call__ |
There was a problem hiding this comment.
Do the base classes add any information compared to the classification and generation one for the user?
For me, you can leave only the classes used by the user.
There was a problem hiding this comment.
The user wants all the information to be at the same place. Please merge this file into the LogitLens classes docstrings.
There was a problem hiding this comment.
Please rename this file logit_lens_example.ipynb.
It should be compiled so users can see the result and what it looks like before trying it themselves.
I tried running it, but it did not compile. Similarly to the tests, there is a problem with model_autoclass -> automodel.
I did not analyze the notebook as I do not have the outputs.
There was a problem hiding this comment.
You can simply call this file logit_lens.py.
| if isinstance(layers_name, str): | ||
| layers_name = [layers_name] | ||
|
|
||
| self.layer_names = layers_name |
There was a problem hiding this comment.
This can surely be included in the self._validate_activations method.
| def _safe_item(self, tensor): | ||
| """Safely extract item from tensor, handling meta tensors.""" | ||
| if tensor.device.type == "meta": | ||
| raise RuntimeError("Cannot extract item from meta tensor. Please ensure your model is properly loaded.") | ||
| return tensor.detach().cpu().item() | ||
|
|
||
| def _safe_tensor_to_cpu(self, tensor): | ||
| """Safely move tensor to CPU, handling meta tensors.""" | ||
| if tensor.device.type == "meta": | ||
| raise RuntimeError("Cannot move meta tensor to CPU. Please ensure your model is properly loaded.") | ||
| return tensor.detach().cpu() |
There was a problem hiding this comment.
Might not be necessary anymore if we solve the meta tensor problem.
| raise ValueError("Unsupported type for inputs") | ||
|
|
||
| if layers_name is None: | ||
| print("Producing lens prediction for all activations.") |
There was a problem hiding this comment.
Remove prints in all your files.
| if isinstance(inputs, (str, list)): | ||
| if self.tokenizer.pad_token is None or self.tokenizer.pad_token_id < 0: | ||
| print("Tokenizer does not have a padding token. Setting a default padding token.") | ||
| self.tokenizer.pad_token = self.tokenizer.eos_token if self.tokenizer.eos_token else "[PAD]" | ||
|
|
||
| if isinstance(inputs, str): | ||
| inputs = [inputs] | ||
| batched_inputs = [inputs[i : i + self.batch_size] for i in range(0, len(inputs), self.batch_size)] | ||
| elif isinstance(inputs, BatchEncoding): | ||
| input_ids = inputs["input_ids"] | ||
| attention_mask = inputs["attention_mask"] | ||
| batched_inputs = [ | ||
| BatchEncoding( | ||
| { | ||
| "input_ids": input_ids[i : i + self.batch_size], | ||
| "attention_mask": attention_mask[i : i + self.batch_size], | ||
| } | ||
| ) | ||
| for i in range(0, input_ids.shape[0], self.batch_size) | ||
| ] | ||
| elif isinstance(inputs, torch.Tensor): | ||
| batched_inputs = torch.split(inputs, self.batch_size, dim=0) |
There was a problem hiding this comment.
Why not let ModelWithSplitPoints.get_activations() handle the batching?
| if layer not in merged_results: | ||
| merged_results[layer] = data | ||
| else: | ||
| # Merge batch results | ||
| self._merge_batch_results(merged_results[layer], data) |
There was a problem hiding this comment.
Be careful, here, for each batch, you concatenate and create a new tensor. This is not good practice for memory usage and computation time.
You should make a list of data for a given layer, then concatenate them all.
AntoninPoche
left a comment
There was a problem hiding this comment.
Hi @Raphael-Bernas, I copied my previous review comment as most of my remarks are still relevant.
You will see the comments, but I basically told you to rework a bit of everything. If you think my comments are harsh, please do not take it to heart, I am focused on the code while reviewing. There are things to improve but you did a great job.
Note that everything that I say can be false; I insist, do not hesitate to challenge my comments. I do not have the complete picture of your code as you do. Sometimes I understand things later.
I suggested too many changes, so you can work on the same branch, I will review the whole code again, do not worry.
Here are a few first general comments:
- You tried to treat too many cases with your first version. You iterated on your code without it being reviewed. You might have had fewer things to modify if you had interacted more with us. (But it is also our fault).
- Several of your choices are not trivial (e.g., a factory or reloading the model). It is better to discuss architectural decisions like these with others before committing to the idea. Because they impact the final API, it is a team choice. Furthermore, we might have elements to help you as we have already developed other parts.
- You must pass the
make lintandmake testbefore asking for reviews. The test on the PR's GitHub page should also pass before asking for a review, but you can ask for help. - You need to install the Pyright package in your editor and set the checks to "standard." This checks that the types you specified are coherent, preventing many uncaught bugs. You should not have red underlines from the type checker; otherwise, there is something to modify. You can say
# type: ignore, but this should not be the default behavior. - Do not leave
printin your code, not your tests. You can add logs if you really need to. - We need to discuss the meta tensor problem. We might be able to prevent it, but we should not load the model in the LogitLens init.
- Several points can be managed by
ModelWithSplitPoints. - The code lacks many comments and docstrings. These are super important, first for the users in the documentation, then for reviewers, and finally, for maintainers.
More precise comments now:
- You can simplify your doc by merging the overview file into the classes' docstrings.
- You wrote many functions to automatically adapt the code's behavior to the module names. There might be as many possibilities as models; hence, you cannot treat them all. It is not a problem to force the user to provide some parameters and guide them through documentation and error raising.
- I think the documentation should be in a different file, but we can discuss.
- After removing the automation methods and visualization, I think that most of the code can be factored into the base class. Hence, there might be a possibility that we only need one class. Which would be much more suitable for documentation and users.
I know it is a lot of remarks. It would have been easier if you had submitted something smaller or discussed more. I am sorry I did not catch it earlier.
| class LanguageModelLogitLens(BaseLogitLens): | ||
| """ | ||
| Logit Lens implementation for Language Models (Causal LM and Masked LM). | ||
|
|
||
| This implementation shows what tokens the model "thinks" should come next | ||
| or fill a mask at each layer. | ||
| """ | ||
|
|
||
| def __init__(self, model: ModelWithSplitPoints, tokenizer: PreTrainedTokenizer, nb_token: int = 5, **kwargs): | ||
| self.vocab_size = tokenizer.vocab_size if tokenizer else None | ||
| self.nb_token = min(nb_token, self.vocab_size) if self.vocab_size else nb_token | ||
| super().__init__(model, tokenizer, **kwargs) |
There was a problem hiding this comment.
There should be a description of each attribute and parameter.
What is nb_token used for? How does it differ from vocab_size?
| Logit Lens implementation for Language Models (Causal LM and Masked LM). | ||
|
|
||
| This implementation shows what tokens the model "thinks" should come next | ||
| or fill a mask at each layer. |
There was a problem hiding this comment.
The description should be much more precise and concrete. Users might not know what a logit lens is.
Furthermore, I would add a warning in the docstring that the activation distribution shifts with layers. Hence, the predicted tokens might not make sense in the early layer due to this distribution shift.
Ultimately, I suggest copying most of the docstring between the classes the user might use for the logit lens.
It would be best if the user only uses one class.
| @abstractmethod | ||
| def do_lens(self, activation: torch.Tensor, inputs_model=None) -> torch.Tensor: | ||
| """ | ||
| Apply the Logit Lens method to the model activations. | ||
|
|
||
| Args: | ||
| activation: Model activations tensor | ||
| inputs_model: Optional model inputs for accessing attention mask | ||
|
|
||
| Returns: | ||
| Processed logits tensor | ||
| """ | ||
| pass |
There was a problem hiding this comment.
It seems that this method should not be used by the user, hence, it should be protected.
Furthermore, the name is not precise at all. I suggest calling it _predict_logits_from_activations.
| # Get predicted classes | ||
| predicted_classes = torch.argmax(proba, dim=-1) | ||
|
|
||
| # Get top-k predictions | ||
| top_k_values, top_k_indices = torch.topk(proba, k=min(self.num_classes, 5), dim=-1) |
There was a problem hiding this comment.
The argmax is included in the topk. You do the topk line and write: predicted_classes = top_k_indices[:, 0].
| results = {} | ||
| for layer, logits in logits_dict.items(): | ||
| proba = F.softmax(logits, dim=-1) | ||
| top_k_indices = torch.topk(proba, self.nb_token, dim=-1).indices | ||
|
|
||
| results[layer] = { | ||
| "tokens": np.array( | ||
| [ | ||
| [ | ||
| [ | ||
| self.tokenizer.decode([self._safe_item(index)]) | ||
| for index in top_k_indices[batch_idx, seq_idx] | ||
| ] | ||
| for seq_idx in range(top_k_indices.shape[1]) | ||
| ] | ||
| for batch_idx in range(top_k_indices.shape[0]) | ||
| ] | ||
| ), | ||
| "proba": np.array( | ||
| [ | ||
| [ | ||
| proba[batch_idx, seq_idx, top_k_indices[batch_idx, seq_idx]].detach().cpu().numpy() | ||
| for seq_idx in range(top_k_indices.shape[1]) | ||
| ] | ||
| for batch_idx in range(top_k_indices.shape[0]) | ||
| ] | ||
| ), | ||
| } | ||
| return results |
There was a problem hiding this comment.
This seemed really unoptimal to me. I asked our friend ChatGPT, and it gave me the following, which seems much better:
def process_logits(logits_dict, tokenizer, nb_token: int):
results = {}
for layer, logits in logits_dict.items():
# Compute probabilities
proba = F.softmax(logits, dim=-1)
# Get top-k indices and values directly (avoids recomputing probabilities later)
top_k_values, top_k_indices = torch.topk(proba, nb_token, dim=-1)
# Flatten indices for fast batch decoding
flat_indices = top_k_indices.reshape(-1).tolist()
# Batch decode once instead of looping per token
decoded_tokens = tokenizer.convert_ids_to_tokens(flat_indices)
# Reshape to original [batch, seq, k]
decoded_tokens = np.array(decoded_tokens, dtype=object).reshape(
top_k_indices.shape
)
# Move proba to numpy (already aligned with indices)
top_k_proba = top_k_values.detach().cpu().numpy()
results[layer] = {
"tokens": decoded_tokens,
"proba": top_k_proba,
}
return results
There was a problem hiding this comment.
I said in a previous comment that this function should treat only one layer at a time.
Therefore, the output does not have to be a dict of dict. But it can be a tuple of tensors (tokens, proba). This would be much better for type checking and clarity.
It is better to refrain from using dict for outputs when you can.
| - Other Methods: | ||
| - Overview: api/others/overview.md | ||
| - Methods: | ||
| - Logit Lens: api/others/methods/logit_lens.md |
There was a problem hiding this comment.
After looking at the documentation, I think people will find it easier if you just put Logit Lens at the same level as attributions and concepts (in the documentation).
Then, you can merge the overview file into the logit lens classes docstrings. There should be a single documentation file left.
Please refer to the ModelWithSplitPoints documentation in your classes documentation. (I do not remember how, maybe :class:~ModelWithSplitPoints).
| CLASSIFICATION_MODELS = { | ||
| "hf-internal-testing/tiny-random-bert": { | ||
| "model_class": AutoModelForSequenceClassification, | ||
| "split_points": ["bert.encoder.layer.1.output"], | ||
| "head_name": "classifier", | ||
| "pooling_strategy": "cls", | ||
| }, | ||
| } |
There was a problem hiding this comment.
hf-internal-testing does not have initialized classification heads. That may explain your meta tensor error.
| CAUSAL_LM_MODELS = { | ||
| "hf-internal-testing/tiny-random-gpt2": { | ||
| "model_class": AutoModelForCausalLM, | ||
| "split_points": ["transformer.h.1.mlp"], | ||
| "head_name": "lm_head", | ||
| }, | ||
| "hf-internal-testing/tiny-random-gpt_neo": { | ||
| "model_class": AutoModelForCausalLM, | ||
| "split_points": ["transformer.h.1.mlp"], | ||
| "head_name": "lm_head", | ||
| }, | ||
| } | ||
|
|
||
| MASKED_LM_MODELS = { | ||
| "hf-internal-testing/tiny-random-bert": { | ||
| "model_class": AutoModelForMaskedLM, | ||
| "split_points": ["bert.encoder.layer.1.output"], | ||
| "head_name": "cls.predictions.decoder", | ||
| }, | ||
| "hf-internal-testing/tiny-random-roberta": { | ||
| "model_class": AutoModelForMaskedLM, | ||
| "split_points": ["roberta.encoder.layer.1.output"], | ||
| "head_name": "lm_head", | ||
| }, | ||
| } | ||
|
|
||
| CLASSIFICATION_MODELS = { | ||
| "hf-internal-testing/tiny-random-bert": { | ||
| "model_class": AutoModelForSequenceClassification, | ||
| "split_points": ["bert.encoder.layer.1.output"], | ||
| "head_name": "classifier", | ||
| "pooling_strategy": "cls", | ||
| }, | ||
| } | ||
|
|
||
| CI_MODELS = { | ||
| "causal_lm": ["hf-internal-testing/tiny-random-gpt2"], | ||
| "masked_lm": ["hf-internal-testing/tiny-random-bert"], | ||
| "classification": ["hf-internal-testing/tiny-random-bert"], | ||
| } |
There was a problem hiding this comment.
This is not enough model diversity.
| # Test fixtures | ||
| @pytest.fixture | ||
| def sentences(): | ||
| """Test sentences for experiments.""" | ||
| return [ | ||
| "Interpreto is the latin for 'to interpret'. But it also sounds like a spell from the Harry Potter books.", | ||
| "Interpreto is magical", | ||
| "Testing interpreto", | ||
| ] |
There was a problem hiding this comment.
You do not have to define it, it already exist in tests/conftest.py, therefore it is automatically detected as a fixture.
There was a problem hiding this comment.
Most of your tests concern methods that automatically set parameters from module names. I suggested you remove these parts, so you can remove the tests too.
You tried several models, but it is far from enough due to the diversity of how models manage their heads.
I assume that with more models, the methods above would break.
I suggested too many modifications for the tests to remain relevant next time.
I will review the whole code again next time.
Description
Implemented Logit Lens
Related Issue
Type of Change
Checklist
CODE_OF_CONDUCT.mddocument.CONTRIBUTING.mdguide.make lint.make test.