uonlp/CulturaX
Viewer • Updated • 7.18B • 25k • 636
How to use learninbit/malayalam-llama-2-tokenizer-v0.1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="learninbit/malayalam-llama-2-tokenizer-v0.1") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("learninbit/malayalam-llama-2-tokenizer-v0.1", dtype="auto")LlamaTokenizer leading to a total of 49,120 tokens from 32,000 from the original tokenizer.from transformers import LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("learninbit/malayalam-llama-2-tokenizer-v0.1")
text = "ഹനഫസ ഹഫഞ്ചഥ ചകഡു ടെണല ഡൃൊമത്തീഴ ടഞ്ഞഭഞ റദ്ധഷ ഌിപത്മഫഥ ടജ്ജഡ്ഡപ്പെവ പഴുണൊ."
tokens = tokenizer.tokenizer(text)