When finetuning on SROIE

Hi. Thank you for interesting work and releasing the code. Can i ask some questions when finetuing on SROIE?
1. As SROIE is a dataset of reciept, the number of valid token is pretty small compared to 512. Did you still use 512 for sequence length? I think too many padding tokens may give noise to model since the model initialized with RoBERTa.

2. Did you use BIO tags in word-level? SROIE key-value pair is not word level, rather line level. Let me give some example. If given key 'company', value is "STARBUCKS Store #10208" and OCR is give as splitted strings "STARBUCKS Store", "#10208". 
Some people tags B-Company to the first OCR string and I-Company to the second OCR string, and then tokenizes it, so that the result seems like:
("STAR", "B-Company"), ("BUCKS", "B-Company"), ("Store", **"B-Company"**), ("#", "I-Company"), ("10208", "I-Company").
But i think BIO should be tagged in word level. So the results should be 
("STAR", "B-Company"), ("BUCKS", "B-Company"), ("Store", "**I-Company**"), ("#", "I-Company"), ("10208", "I-Company").
In short, only the first word 'STARBUCKS" is tagged as "B-Company" and the others are "I-Company".
It may be a minor issue, but SROIE contains complicate address or company name and i just want to check the right way. Can you tell me which one did you use?
In short, only the first word 'STARBUCKS" is tagged as "B-Company" and the others are "I-Company".
+ Or, did you use token-level tags?
("STAR", "**B-Company**"), ("BUCKS", "I-Company"), ("Store", "I-Company"), ("#", "I-Company"), ("10208", "I-Company").


Thank you for reading!
Best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When finetuning on SROIE #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

When finetuning on SROIE #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions