Division by zero caused by mask operation

If each pixel in the input image does not belong to the q-th class, then when generating the mask for masked attention, `attn_mask[b, q, :] = True` will be converted to `attn_mask[b, q, :] = float('-inf')` in `nn.MultiheadAttention`. Finally, when `attn_mask` is used for the Softmax(attn_mask, dim=-1) operation to calculate the attention map, the NaN caused by the divide by 0 error will appear. : (
This problem came up when I applied masked attention to my semantic segmentation task. : (
![image](https://github.com/user-attachments/assets/371675c0-351e-4f53-969c-0e00749a4450)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Division by zero caused by mask operation #243

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Division by zero caused by mask operation #243

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions