Qianfan-OCR โ€” GGUF Quantizations

GGUF quantizations of baidu/Qianfan-OCR.

Original model: InternVL Chat architecture with Qwen3 LLM backbone (~4.7B params). Quantized by Reza2kn using llama.cpp.

Files

Filename Quant Size Quality
Qianfan-OCR-f16.gguf F16 ~9.4 GB Lossless (half precision)
Qianfan-OCR-q8_0.gguf Q8_0 ~5.0 GB Near-lossless
Qianfan-OCR-q6_k.gguf Q6_K ~3.8 GB Excellent
Qianfan-OCR-q5_k_m.gguf Q5_K_M ~3.3 GB Very good
Qianfan-OCR-q5_k_s.gguf Q5_K_S ~3.2 GB Very good
Qianfan-OCR-q4_k_m.gguf Q4_K_M ~2.8 GB Good (recommended)
Qianfan-OCR-q4_k_s.gguf Q4_K_S ~2.7 GB Good
Qianfan-OCR-q4_0.gguf Q4_0 ~2.6 GB Legacy 4-bit
Qianfan-OCR-q3_k_m.gguf Q3_K_M ~2.2 GB Moderate
Qianfan-OCR-q3_k_s.gguf Q3_K_S ~2.1 GB Moderate
Qianfan-OCR-q2_k.gguf Q2_K ~1.7 GB Low quality

Usage (llama.cpp)

llama-cli -m Qianfan-OCR-q4_k_m.gguf --mmproj Qianfan-OCR-mmproj.gguf \
  --image document.jpg -p "Please OCR this document."

Original Model

See baidu/Qianfan-OCR for full documentation, benchmarks (OmniDocBench 93.12, OCRBench 880), and usage examples.

Downloads last month
2,269
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Reza2kn/Qianfan-OCR-GGUF

Quantized
(3)
this model