confused with the  pooling strategy?

Hi, I'm confused with the pooling strategy you used here. 

For training, you use the avg token
https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/README.md?plain=1#L52

While for evaluation, you are not specifing any pooling flag here,
https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/README.md?plain=1#L99-L105
so this should be default value [cls], right?
https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/eval_sts.py#L57

As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right?  So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?



	2) evaluate on STS benchmark
	```bash
	BiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \
	--model_name_or_path NousResearch/Llama-2-7b-hf \
	--lora_name_or_path SeanLee97/bellm-llama-7b-nli \
	--apply_bfloat16 0
	```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confused with the pooling strategy? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

confused with the pooling strategy? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions