Simplify core attention return types to Tensor by stashuk-olek · Pull Request #537 · facebookresearch/multimodal

stashuk-olek · 2026-02-11T19:34:10Z

Summary:
After removing all consumers of head_mask, return_attn_weights, and attn_probs in the previous commits, the core attention module can be simplified. This commit:

Removes head_mask param from scaled_dot_product_attention and SelfAttention.forward
Changes return types from Tuple[Tensor, Tensor] to Tensor (no longer returning attention probabilities)
Removes return_attn_weights param and tuple unpacking logic from MultiHeadAttention.forward
Cleans up unused imports (Tuple, Union)

No behavioral change — the attention computation itself is unchanged.

Differential Revision: D92927085

meta-codesync · 2026-02-11T19:34:34Z

@stashuk-olek has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92927085.

Summary: After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit: - Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward` - Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities) - Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward` - Cleans up unused imports (`Tuple`, `Union`) No behavioral change — the attention computation itself is unchanged. Differential Revision: D92927085

Summary: After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit: - Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward` - Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities) - Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward` - Cleans up unused imports (`Tuple`, `Union`) No behavioral change — the attention computation itself is unchanged. Reviewed By: OmarPavel Differential Revision: D92927085

… weights in FLAVA (facebookresearch#535) Summary: The `attentions` field on `TransformerOutput` and `return_attn_weights`/`head_mask` parameters in the FLAVA encoder stack were never used by any consumer. This diffs cleans it up. Later the intent is to simplify attention usage / use common API for them. Reviewed By: OmarPavel Differential Revision: D92927086

…h#536) Summary: Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself. Reviewed By: OmarPavel Differential Revision: D92927089

Summary: After removing all consumers of `head_mask`, `return_attn_weights`, and `attn_probs` in the previous commits, the core attention module can be simplified. This commit: - Removes `head_mask` param from `scaled_dot_product_attention` and `SelfAttention.forward` - Changes return types from `Tuple[Tensor, Tensor]` to `Tensor` (no longer returning attention probabilities) - Removes `return_attn_weights` param and tuple unpacking logic from `MultiHeadAttention.forward` - Cleans up unused imports (`Tuple`, `Union`) No behavioral change — the attention computation itself is unchanged. Reviewed By: OmarPavel Differential Revision: D92927085

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026

meta-codesync Bot added fb-exported meta-exported labels Feb 11, 2026

stashuk-olek force-pushed the export-D92927085 branch from 814f4c6 to 6072fa3 Compare February 12, 2026 00:10

stashuk-olek force-pushed the export-D92927085 branch from 6072fa3 to 5c1501b Compare February 13, 2026 21:24

stashuk-olek force-pushed the export-D92927085 branch from 5c1501b to bb3e8d6 Compare February 13, 2026 21:41

stashuk-olek added 3 commits February 25, 2026 15:40

stashuk-olek force-pushed the export-D92927085 branch from bb3e8d6 to b0c1de3 Compare February 25, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify core attention return types to Tensor#537

Simplify core attention return types to Tensor#537
stashuk-olek wants to merge 3 commits into
facebookresearch:mainfrom
stashuk-olek:export-D92927085

stashuk-olek commented Feb 11, 2026

Uh oh!

meta-codesync Bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stashuk-olek commented Feb 11, 2026

Uh oh!

meta-codesync Bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant