Instructions to use Bedovyy/Anima-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use Bedovyy/Anima-INT8 with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Int8 Quantized model of ANIMA

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -
Generation Speed
Test Environment
- ComfyUI commit 264b003
- ComfyUI-INT8-Fast commit 7ff676c
- use
--fast fp16_accumulation fp8_matrix_mult cublas_ops --use-sage-attention --disable-dynamic-vramoptions - Tested on 18/05/26
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| TDP | 170W | 280W | 400W |
| PCIe | PCIe 3.0 x4 | PCIe 4.0 x8 | PCIe 4.0 x16 |
| OS | Windows 11 | Ubuntu 24.04.4 LTS | Ubuntu 24.04.4 LTS |
| Driver | 596.49 | 580.142 | 590.48.01 |
| Python | 3.13.9 | 3.12.3 | 3.12.3 |
| torch | 2.12.0+cu130 | 2.12.0+cu130 | 2.12.0+cu130 |
| triton | 3.7.0.post26 (triton-window) | 3.7.0 | 3.7.0 |
| sageattention | 2.2.0+cu130 | 2.2.0 | 2.2.0 |
No LoRA
832×1216 · er_sde simple · CFG 5.0 · 30 steps · No LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 0.73 it/s / 41.64s | 1.77 it/s / 18.15s | 5.09 it/s / 6.47s |
| BF16 · w/ compile | 0.95 it/s / 32.22s | 2.31 it/s / 14.16s | 6.35 it/s / 5.37s |
| INT8 · w/o compile | 0.87 it/s / 35.82s | 2.09 it/s / 15.67s | 6.32 it/s / 5.46s |
| INT8 · w/ compile | 1.13 it/s / 28.09s | 2.84 it/s / 11.77s | 8.51 it/s / 4.26s |
| Δ w/o compile (BF16→INT8) | +19.18% / +14.00% | +18.08% / +13.69% | +24.17% / +15.61% |
| Δ w/ compile (BF16→INT8) | +18.95% / +12.82% | +22.94% / +16.88% | +34.02% / +20.67% |
Hires LoRA
832×1216 · er_sde simple · CFG 5.0 · 30 steps · Hires LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 0.73 it/s / 41.73s | 1.78 it/s / 18.13s | 5.01 it/s / 7.01s |
| BF16 · w/ compile | 0.95 it/s / 32.22s | 2.04 it/s / 16.00s | 6.21 it/s / 5.47s |
| INT8 (Stoch.) · w/o compile | 0.87 it/s / 36.93s | 2.07 it/s / 15.75s | 6.41 it/s / 6.07s |
| INT8 (Stoch.) · w/ compile | 1.04 it/s / 31.47s | 2.44 it/s / 13.52s | 7.13 it/s / 5.23s |
| INT8 (Dyn.) · w/o compile | 0.70 it/s / 44.45s | 1.67 it/s / 19.32s | 4.96 it/s / 7.42s |
| INT8 (Dyn.) · w/ compile | 0.83 it/s / 37.66s | 2.00 it/s / 16.28s | 5.76 it/s / 6.05s |
| Δ Stoch. · w/o compile | +19.18% / +11.51% | +16.29% / +13.07% | +27.94% / +13.41% |
| Δ Stoch. · w/ compile | +9.47% / +2.33% | +19.61% / +15.50% | +14.81% / +4.39% |
| Δ Dyn. · w/o compile | −4.11% / −6.52% | −6.18% / −6.56% | −1.00% / −5.85% |
| Δ Dyn. · w/ compile | −12.63% / −16.88% | −1.96% / −1.75% | −7.25% / −10.60% |
Turbo LoRA
832×1216 · er_sde simple · CFG 1.0 · 10 steps · Turbo LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 1.44 it/s / 7.57s | 3.55 it/s / 4.10s | 10.21 it/s / 1.59s |
| BF16 · w/ compile | 1.88 it/s / 5.92s | 4.60 it/s / 3.05s | 12.76 it/s / 1.38s |
| INT8 (Stoch.) · w/o compile | 1.73 it/s / 8.55s | 4.15 it/s / 3.32s | 13.08 it/s / 1.78s |
| INT8 (Stoch.) · w/ compile | 2.38 it/s / 6.75s | 5.67 it/s / 2.66s | 14.44 it/s / 1.73s |
| INT8 (Dyn.) · w/o compile | 1.38 it/s / 8.74s | 3.32 it/s / 3.85s | 10.00 it/s / 1.82s |
| INT8 (Dyn.) · w/ compile | 1.38 it/s / 8.94s | 4.69 it/s / 3.01s | 13.59 it/s / 1.53s |
| Δ Stoch. · w/o compile | +20.14% / −12.95% | +16.90% / +19.02% | +28.11% / −11.95% |
| Δ Stoch. · w/ compile | +26.60% / −14.02% | +23.26% / +12.79% | +13.17% / −25.36% |
| Δ Dyn. · w/o compile | −4.17% / −15.46% | −6.48% / +6.10% | −2.06% / −14.47% |
| Δ Dyn. · w/ compile | −26.60% / −51.01% | +1.96% / +1.31% | +6.50% / −10.87% |
Δ it/s = (INT8 − BF16) / BF16 × 100 · Δ Time = (BF16 − INT8) / BF16 × 100 · positive = INT8 faster
How to use
- Cloning ComfyUI-INT8-Fast to
custom_nodesdirectory. - Recommend to run ComfyUI with
--disable-dynamic-vramoption. - Use
Load Diffusion Model INT8 (W8A8)node to model loading and seton_the_fly_qunatizationto False (default).
- Recommend to use "Stochastic" for lora.
Quantized layers
INT8Tensorwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
INT8Rowwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": [
"blocks.0.", "blocks.27.", "adaln_modulation",
".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"
]},
{ "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
- Downloads last month
- 6,607
Model tree for Bedovyy/Anima-INT8
Base model
circlestone-labs/Anima