RLFR
Collection
Extending Reinforcement Learning for LLMs with Flow Environment • 5 items • Updated • 3
How to use JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct")
model = AutoModelForImageTextToText.from_pretrained("JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct
How to use JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct with Docker Model Runner:
docker model run hf.co/JingHaoZ/RLFR-Qwen2.5-VL-7B-Instruct
RLFR-Qwen2.5-VL-7B-Instruct is trained from Qwen2.5-VL-7B-Instruct with the RLFR framework, which introduces the flow reward derived from latent space, extending RLVR with latent reward utilization.
If you find our work helpful, feel free to give us a citation.
@article{zhang2025rlfr,
title={RLFR: Extending Reinforcement Learning for LLMs with Flow Environment},
author={Zhang, Jinghao and Zheng, Naishan and Li, Ruilin and Cheng, Dongzhou and Liang, Zheming and Zhao, Feng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2510.10201},
year={2025}
}