Skip to content

Change UNK for token-ralated processor to 1#822

Merged
jhnwu3 merged 2 commits intosunlabuiuc:masterfrom
Logiquo:fix/token-processor
Feb 6, 2026
Merged

Change UNK for token-ralated processor to 1#822
jhnwu3 merged 2 commits intosunlabuiuc:masterfrom
Logiquo:fix/token-processor

Conversation

@Logiquo
Copy link
Collaborator

@Logiquo Logiquo commented Feb 6, 2026

Enforce a processor convention such that for token-related processors, we set PAD=0 and UNK=1 for the following reasons

lots of the interpret methods expect a natural baseline, but right now UNK= len(tokens.size), which is hard to find, and some methods (e.g. lime) use 1 as the baseline, which is incorrect (because 1 is a real feature)
for feature addition & removal in the processors, UNK token does not move around when we expand or shrink number of tokens. UNK moving around may hurt performances when we want to continous learning and changing token position.

@Logiquo Logiquo requested a review from jhnwu3 February 6, 2026 03:51
@Logiquo Logiquo marked this pull request as ready for review February 6, 2026 04:07
Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for looking ahead.

@jhnwu3 jhnwu3 merged commit 743b7cd into sunlabuiuc:master Feb 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants