Instructions to use webbigdata/Qwen3-0.6B_WBD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use webbigdata/Qwen3-0.6B_WBD with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="webbigdata/Qwen3-0.6B_WBD",
	filename="Q8_0-00001-of-00002.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use webbigdata/Qwen3-0.6B_WBD with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Use Docker

docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0

LM Studio
Jan

vLLM

How to use webbigdata/Qwen3-0.6B_WBD with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "webbigdata/Qwen3-0.6B_WBD"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "webbigdata/Qwen3-0.6B_WBD",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0

Ollama
How to use webbigdata/Qwen3-0.6B_WBD with Ollama:
```
ollama run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0
```

Unsloth Studio new

How to use webbigdata/Qwen3-0.6B_WBD with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

Pi new

How to use webbigdata/Qwen3-0.6B_WBD with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "webbigdata/Qwen3-0.6B_WBD:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use webbigdata/Qwen3-0.6B_WBD with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default webbigdata/Qwen3-0.6B_WBD:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use webbigdata/Qwen3-0.6B_WBD with Docker Model Runner:
```
docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0
```

Lemonade

How to use webbigdata/Qwen3-0.6B_WBD with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull webbigdata/Qwen3-0.6B_WBD:Q8_0

Run and chat with the model

lemonade run user.Qwen3-0.6B_WBD-Q8_0

List all available models

lemonade list

webbigdata/Qwen3-0.6B_WBD

Qwen3-0.6Bに継続学習を行い、日本語能力・推論能力・日常会話能力を強化した軽量日本語モデルです。
ブラウザ上での完全動作 スマートフォン、エッジデバイスでの動作を主な目標として開発されました。
A lightweight Japanese-enhanced model based on Qwen3-0.6B with improved Japanese language ability, reasoning, and conversational capability.
Designed primarily to run completely in-browser and on smartphones, and edge devices.

ニュース / News

ブラウザデモ公開 インストール不要・サーバー不要でブラウザ上で完全動作するデモを公開 → webbigdata SLM Demo
スマートフォン動作確認済み 2020年発売のAQUOS sense4 basic（Snapdragon 720G / RAM 3GB）で 17.20 t/s の動作を確認 → 動作動画
スマートフォン向け量子化版公開 executorchを使った4bit量子化版を公開 → dahara1/Qwen3-0.6B-executorch-jp

モデル概要 / Model Overview

項目	内容
ベースモデル / Base Model	Qwen/Qwen3-0.6B
パラメータ数 / Parameters	約6億 (0.6B)
ライセンス / License	Apache 2.0
対応言語 / Languages	日本語・英語 (Japanese / English)
学習手法 / Training	SFT、RL、8bit量子化
開発者 / Developer	dahara1@webbigdata

ブラウザデモ / Browser Demo

インストール不要・サーバー不要。ブラウザで今すぐ試せます。
No installation, no server required. Try it directly in your browser.

👉 https://webbigdata.jp/slm/

WASM + llama.cpp による完全クライアントサイド動作。パラメータ数0.6B（8ビット量子化）610MBのモデルがブラウザ上で推論します。
Fully client-side inference via WASM + llama.cpp. A 610MB (8-bit quantized, 0.6B parameter) model runs entirely in-browser.

特徴 / Features

日本語能力の底上げ：独自データによる継続学習により、日本語の語彙・知識・表現力を強化
推論能力の強化：強化学習(RL)をにより、論理的な推論能力を向上
日本語日常会話能力の強化：自然な日本語会話を目指した学習を実施
※ 0.6Bモデルの性質上、複数ターンに及ぶ長い会話には限界があります
ブラウザ完全動作：WASM + llama.cppによりサーバー不要でブラウザ上で動作
スマートフォン動作確認済み：executorchにより2020年発売の廉価端末（Snapdragon 720G / RAM 3GB）で17.20 t/s を確認

ベンチマーク結果 / Benchmark Results

日本語ベンチマーク / Japanese Benchmarks

Model	JCommonsenseQA	JNLI	JSTS	JSQuAD	Average
Qwen3-0.6B-Q8_0（ベースライン）	62.40%	32.20%	17.20%	76.00%	46.95%
Qwen3-0.6B_WBD（本モデル）	59.60%	72.60%	35.60%	82.00%	62.45%

継続学習により平均スコアが 46.95% → 62.45%（+15.5pt） に向上しました。特にJNLI（自然言語推論）は +40.4pt と大幅に改善しています。

JCommonsenseQAのわずかな低下は、知識・語彙が増えた結果、微妙なニュアンスで迷いが生じるケースが増えたためです。

他モデルとの比較について / Comparison with Other Models

NTTのtsuzumi（0.6B）など同サイズ帯の日本語特化モデルも存在しますが、JCommonsenseQA・JNLI・JSTS・JSQuADの具体的な数値を公開しているモデルは少なく、現時点で同一ベンチマークでの直接比較はできていません。本モデルは再現可能な評価条件を公開しています。

M-IFEval（日本語命令追従能力）

Model	prompt-level (strict)	instruction-level (strict)
Qwen3-0.6B-Q8_0	0.366	0.420
Qwen3-0.6B_WBD	0.238	0.314

M-IFEVALの低下について：評価セットには「英語以外の言語への翻訳」など日本語特化学習と相性の悪いタスクが混在しています。
日本語固有タスク（キーワード存在確認・文字数制約・numbered listなど）では競争力のある性能を示しています。

スマートフォン動作 / Smartphone Performance

executorchを使った4bit量子化版により、スマートフォン上での動作を実現しています。

動作確認端末：

項目	内容
機種	AQUOS sense4 basic A003SH
発売日	2020年11月19日（5年前の廉価スマートフォン）
OS	Android 12
SoC	Qualcomm Snapdragon 720G（オクタコア）
RAM	3GB
動作速度	17.20 t/s

📹 動作確認動画（YouTube Shorts）

注意： 現時点でのスマートフォン動作はPC経由のケーブル転送が必要です。一般向けアプリとしての配布はまだ行っていません。iPhone向けはシミュレーター上での動作確認のみです。

スマートフォン向け量子化版：dahara1/Qwen3-0.6B-executorch-jp

動かし方 / How to Run

llama.cpp を使った方法

llama.cpp からお使いのハードウェア向けのパッケージをダウンロードしてください。
Ollama や LM Studio など、ggufファイルに対応したツールでも動かすことができます。

CLIで動かす（Linux/Mac）

./llama-cli -hf webbigdata/Qwen3-0.6B_WBD --ctx-size 4096 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.01 --repeat-penalty 1.05

llama-server で起動してブラウザからアクセスする

./llama-server -hf webbigdata/Qwen3-0.6B_WBD --host 0.0.0.0 --port 8080 --ctx-size 4096 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.01 --repeat-penalty 1.05

ブラウザで http://127.0.0.1:8080/ を開いてください。

Python スクリプトからアクセスする（OpenAI互換API）

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="dummy"
)

response = client.chat.completions.create(
    model="webbigdata/Qwen3-0.6B_WBD",
    messages=[
        {"role": "system", "content": "あなたは親切なアシスタントです。"},
        {"role": "user", "content": "こんにちは！"}
    ],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Qwen3 推奨パラメーター設定 / Recommended Parameters

Qwen3はGreedy decoding（Temperature=0などの決定論的生成）を使用すると繰り返し生成などの不具合が起きやすいため、サンプリング（Temperature > 0）の使用を強く推奨します。

パラメーター	推奨値
Temperature	0.7
Top_P	0.8
Top_K	20
Min_P	0.01
Repetition Penalty	1.05

量子化バリアント / Quantized Variants

バリアント	説明	リンク
executorch 4bit版	スマートフォン向け動作用	dahara1/Qwen3-0.6B-executorch-jp

学習データ / Training Data

独自に収集・合成したプライベートデータセットを使用しています。
Private datasets collected and created by webbigdata.

謝辞 / Acknowledgments

Qwen/Qwen3-0.6B — ベースモデル
Qwen/Qwen3-0.6B — プロンプトテンプレート
llama.cpp — 推論エンジン
wllama — WebAssembly
Hugging Face — モデルホスティング

開発者 / Developer

Developed by: dahara1@webbigdata
Model type: Text Generation (Causal LM)
Language(s): Japanese, English
Base Model: Qwen/Qwen3-0.6B
Demo: https://webbigdata.jp/slm/
X (Twitter): https://x.com/webbigdata
お問い合わせ / Contact: https://webbigdata.jp/webbigdata/inquiry/

@misc{dahara2025Qwen3-0.6B_WBD,
  author       = {dahara1@webbigdata},
  title        = {Qwen3-0.6B_WBD - Japanese-Enhanced Continual Learning Model},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/webbigdata/Qwen3-0.6B_WBD}},
  abstract     = {A lightweight Japanese-enhanced model based on Qwen3-0.6B, designed to run in browsers and on smartphones.},
}

Downloads last month: 58

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

8-bit

Model tree for webbigdata/Qwen3-0.6B_WBD

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(306)

this model