Skip to content

Simplify core attention return types to Tensor#537

Open
stashuk-olek wants to merge 3 commits into
facebookresearch:mainfrom
stashuk-olek:export-D92927085
Open

Simplify core attention return types to Tensor#537
stashuk-olek wants to merge 3 commits into
facebookresearch:mainfrom
stashuk-olek:export-D92927085

Conversation

@stashuk-olek

Copy link
Copy Markdown

Summary:
After removing all consumers of head_mask, return_attn_weights, and attn_probs in the previous commits, the core attention module can be simplified. This commit:

  • Removes head_mask param from scaled_dot_product_attention and SelfAttention.forward
  • Changes return types from Tuple[Tensor, Tensor] to Tensor (no longer returning attention probabilities)
  • Removes return_attn_weights param and tuple unpacking logic from MultiHeadAttention.forward
  • Cleans up unused imports (Tuple, Union)

No behavioral change — the attention computation itself is unchanged.

Differential Revision: D92927085

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026
@meta-codesync

meta-codesync Bot commented Feb 11, 2026

Copy link
Copy Markdown

@stashuk-olek has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92927085.

stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 12, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Differential Revision: D92927085
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
… weights in FLAVA (facebookresearch#535)

Summary:

The `attentions` field on `TransformerOutput` and `return_attn_weights`/`head_mask` parameters in the FLAVA encoder stack were never used by any consumer. 

This diffs cleans it up. Later the intent is to simplify attention usage / use common API for them.

Reviewed By: OmarPavel

Differential Revision: D92927086
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
Summary:

After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit:
- Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward`
- Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities)
- Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward`
- Cleans up unused imports (`Tuple`, `Union`)

No behavioral change — the attention computation itself is unchanged.

Reviewed By: OmarPavel

Differential Revision: D92927085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant