Do bloom models use <s> and </s> tokens?

#274

by abuelnasr - opened Feb 4, 2024

Feb 4, 2024

•

edited Feb 4, 2024

bloom tokenizer have bos_token = <s> and eos_token = </s>, but they are not actually used by the tokenizer to wrap the input.
https://huggingface.co/docs/transformers/model_doc/bloom#transformers.BloomTokenizerFast

Is that a bug or bloom model doesn't use these special tokens and didn't use them during training. and if bloom doesn't use them then what is the purpose of having them in the tokenizer?

abuelnasr

Feb 4, 2024

cc @ybelkada

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment