Text-to-Image
Diffusion Single File
comfyui

Int8 Quantized model of ANIMA

Prompt
-
Prompt
-
Prompt
-
Prompt
-
Prompt
-
Prompt
-
Prompt
-
Prompt
-
Prompt
-

Generation Speed

Test Environment

  • ComfyUI commit 264b003
  • ComfyUI-INT8-Fast commit 7ff676c
  • use --fast fp16_accumulation fp8_matrix_mult cublas_ops --use-sage-attention --disable-dynamic-vram options
  • Tested on 18/05/26
RTX 3060 RTX 3090 RTX 5090
TDP 170W 280W 400W
PCIe PCIe 3.0 x4 PCIe 4.0 x8 PCIe 4.0 x16
OS Windows 11 Ubuntu 24.04.4 LTS Ubuntu 24.04.4 LTS
Driver 596.49 580.142 590.48.01
Python 3.13.9 3.12.3 3.12.3
torch 2.12.0+cu130 2.12.0+cu130 2.12.0+cu130
triton 3.7.0.post26 (triton-window) 3.7.0 3.7.0
sageattention 2.2.0+cu130 2.2.0 2.2.0

No LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · No LoRA

RTX 3060 RTX 3090 RTX 5090
BF16 · w/o compile 0.73 it/s / 41.64s 1.77 it/s / 18.15s 5.09 it/s / 6.47s
BF16 · w/ compile 0.95 it/s / 32.22s 2.31 it/s / 14.16s 6.35 it/s / 5.37s
INT8 · w/o compile 0.87 it/s / 35.82s 2.09 it/s / 15.67s 6.32 it/s / 5.46s
INT8 · w/ compile 1.13 it/s / 28.09s 2.84 it/s / 11.77s 8.51 it/s / 4.26s
Δ w/o compile (BF16→INT8) +19.18% / +14.00% +18.08% / +13.69% +24.17% / +15.61%
Δ w/ compile (BF16→INT8) +18.95% / +12.82% +22.94% / +16.88% +34.02% / +20.67%

Hires LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · Hires LoRA

RTX 3060 RTX 3090 RTX 5090
BF16 · w/o compile 0.73 it/s / 41.73s 1.78 it/s / 18.13s 5.01 it/s / 7.01s
BF16 · w/ compile 0.95 it/s / 32.22s 2.04 it/s / 16.00s 6.21 it/s / 5.47s
INT8 (Stoch.) · w/o compile 0.87 it/s / 36.93s 2.07 it/s / 15.75s 6.41 it/s / 6.07s
INT8 (Stoch.) · w/ compile 1.04 it/s / 31.47s 2.44 it/s / 13.52s 7.13 it/s / 5.23s
INT8 (Dyn.) · w/o compile 0.70 it/s / 44.45s 1.67 it/s / 19.32s 4.96 it/s / 7.42s
INT8 (Dyn.) · w/ compile 0.83 it/s / 37.66s 2.00 it/s / 16.28s 5.76 it/s / 6.05s
Δ Stoch. · w/o compile +19.18% / +11.51% +16.29% / +13.07% +27.94% / +13.41%
Δ Stoch. · w/ compile +9.47% / +2.33% +19.61% / +15.50% +14.81% / +4.39%
Δ Dyn. · w/o compile −4.11% / −6.52% −6.18% / −6.56% −1.00% / −5.85%
Δ Dyn. · w/ compile −12.63% / −16.88% −1.96% / −1.75% −7.25% / −10.60%

Turbo LoRA

832×1216 · er_sde simple · CFG 1.0 · 10 steps · Turbo LoRA

RTX 3060 RTX 3090 RTX 5090
BF16 · w/o compile 1.44 it/s / 7.57s 3.55 it/s / 4.10s 10.21 it/s / 1.59s
BF16 · w/ compile 1.88 it/s / 5.92s 4.60 it/s / 3.05s 12.76 it/s / 1.38s
INT8 (Stoch.) · w/o compile 1.73 it/s / 8.55s 4.15 it/s / 3.32s 13.08 it/s / 1.78s
INT8 (Stoch.) · w/ compile 2.38 it/s / 6.75s 5.67 it/s / 2.66s 14.44 it/s / 1.73s
INT8 (Dyn.) · w/o compile 1.38 it/s / 8.74s 3.32 it/s / 3.85s 10.00 it/s / 1.82s
INT8 (Dyn.) · w/ compile 1.38 it/s / 8.94s 4.69 it/s / 3.01s 13.59 it/s / 1.53s
Δ Stoch. · w/o compile +20.14% / −12.95% +16.90% / +19.02% +28.11% / −11.95%
Δ Stoch. · w/ compile +26.60% / −14.02% +23.26% / +12.79% +13.17% / −25.36%
Δ Dyn. · w/o compile −4.17% / −15.46% −6.48% / +6.10% −2.06% / −14.47%
Δ Dyn. · w/ compile −26.60% / −51.01% +1.96% / +1.31% +6.50% / −10.87%

Δ it/s = (INT8 − BF16) / BF16 × 100 · Δ Time = (BF16 − INT8) / BF16 × 100 · positive = INT8 faster

How to use

  1. Cloning ComfyUI-INT8-Fast to custom_nodes directory.
  2. Recommend to run ComfyUI with --disable-dynamic-vram option.
  3. Use Load Diffusion Model INT8 (W8A8) node to model loading and set on_the_fly_qunatization to False (default). image
  4. Recommend to use "Stochastic" for lora.

Quantized layers

INT8Tensorwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

INT8Rowwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": [
            "blocks.0.", "blocks.27.", "adaln_modulation",
            ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"
    ]},
    { "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}
Downloads last month
6,607
Inference Examples
Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bedovyy/Anima-INT8

Quantized
(18)
this model