Instructions to use radlab/pLLama3-8B-creator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use radlab/pLLama3-8B-creator with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="radlab/pLLama3-8B-creator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("radlab/pLLama3-8B-creator")
model = AutoModelForCausalLM.from_pretrained("radlab/pLLama3-8B-creator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use radlab/pLLama3-8B-creator with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "radlab/pLLama3-8B-creator"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "radlab/pLLama3-8B-creator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/radlab/pLLama3-8B-creator

SGLang

How to use radlab/pLLama3-8B-creator with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "radlab/pLLama3-8B-creator" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "radlab/pLLama3-8B-creator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "radlab/pLLama3-8B-creator" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "radlab/pLLama3-8B-creator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use radlab/pLLama3-8B-creator with Docker Model Runner:
```
docker model run hf.co/radlab/pLLama3-8B-creator
```

Intro

We have released a collection of radlab/pLLama3 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture. We make models in the 8B architecture available in two configurations:

radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries;
radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model.

Dataset

In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets. In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.

Learning

The learning process was divided into two stages:

Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.

The models we released are the ones after FT and the DPO process.

Post-FT learning metrics:

eval/loss: 0.8690009713172913
eval/runtime :464.5158
eval/samples_per_second: 8.611
eval/steps_per_second: 8.611

Post-DPO learning metrics:

eval/logits/chosen: 0.1370937079191208
eval/logits/rejected: 0.07430506497621536
eval/logps/chosen: -454.11962890625
eval/logps/rejected: -764.1261596679688
eval/loss: 0.05717926099896431
eval/rewards/accuracies: 0.9372459053993224
eval/rewards/chosen: -26.75682830810547
eval/rewards/margins: 32.37759780883789
eval/rewards/rejected: -59.134429931640625
eval/runtime: 1,386.3177
eval/samples_per_second: 2.838
eval/steps_per_second: 1.42

Outro

Model tree for radlab/pLLama3-8B-creator

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Quantized

(270)

this model

Collection including radlab/pLLama3-8B-creator

pLLama Models

Collection

LIst of all pLLama models -- finetuned LLama models • 11 items • Updated about 1 month ago