Int8 Quantized model of ANIMA

Generation Speed

Test Environment

ComfyUI commit 264b003
ComfyUI-INT8-Fast commit 7ff676c
use --fast fp16_accumulation fp8_matrix_mult cublas_ops --use-sage-attention --disable-dynamic-vram options
Tested on 18/05/26

	RTX 3060	RTX 3090	RTX 5090
TDP	170W	280W	400W
PCIe	PCIe 3.0 x4	PCIe 4.0 x8	PCIe 4.0 x16
OS	Windows 11	Ubuntu 24.04.4 LTS	Ubuntu 24.04.4 LTS
Driver	596.49	580.142	590.48.01
Python	3.13.9	3.12.3	3.12.3
torch	2.12.0+cu130	2.12.0+cu130	2.12.0+cu130
triton	3.7.0.post26 (triton-window)	3.7.0	3.7.0
sageattention	2.2.0+cu130	2.2.0	2.2.0

No LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · No LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	0.73 it/s / 41.64s	1.77 it/s / 18.15s	5.09 it/s / 6.47s
BF16 · w/ compile	0.95 it/s / 32.22s	2.31 it/s / 14.16s	6.35 it/s / 5.37s
INT8 · w/o compile	0.87 it/s / 35.82s	2.09 it/s / 15.67s	6.32 it/s / 5.46s
INT8 · w/ compile	1.13 it/s / 28.09s	2.84 it/s / 11.77s	8.51 it/s / 4.26s
Δ w/o compile (BF16→INT8)	+19.18% / +14.00%	+18.08% / +13.69%	+24.17% / +15.61%
Δ w/ compile (BF16→INT8)	+18.95% / +12.82%	+22.94% / +16.88%	+34.02% / +20.67%

Hires LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · Hires LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	0.73 it/s / 41.73s	1.78 it/s / 18.13s	5.01 it/s / 7.01s
BF16 · w/ compile	0.95 it/s / 32.22s	2.04 it/s / 16.00s	6.21 it/s / 5.47s
INT8 (Stoch.) · w/o compile	0.87 it/s / 36.93s	2.07 it/s / 15.75s	6.41 it/s / 6.07s
INT8 (Stoch.) · w/ compile	1.04 it/s / 31.47s	2.44 it/s / 13.52s	7.13 it/s / 5.23s
INT8 (Dyn.) · w/o compile	0.70 it/s / 44.45s	1.67 it/s / 19.32s	4.96 it/s / 7.42s
INT8 (Dyn.) · w/ compile	0.83 it/s / 37.66s	2.00 it/s / 16.28s	5.76 it/s / 6.05s
Δ Stoch. · w/o compile	+19.18% / +11.51%	+16.29% / +13.07%	+27.94% / +13.41%
Δ Stoch. · w/ compile	+9.47% / +2.33%	+19.61% / +15.50%	+14.81% / +4.39%
Δ Dyn. · w/o compile	−4.11% / −6.52%	−6.18% / −6.56%	−1.00% / −5.85%
Δ Dyn. · w/ compile	−12.63% / −16.88%	−1.96% / −1.75%	−7.25% / −10.60%

Turbo LoRA

832×1216 · er_sde simple · CFG 1.0 · 10 steps · Turbo LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	1.44 it/s / 7.57s	3.55 it/s / 4.10s	10.21 it/s / 1.59s
BF16 · w/ compile	1.88 it/s / 5.92s	4.60 it/s / 3.05s	12.76 it/s / 1.38s
INT8 (Stoch.) · w/o compile	1.73 it/s / 8.55s	4.15 it/s / 3.32s	13.08 it/s / 1.78s
INT8 (Stoch.) · w/ compile	2.38 it/s / 6.75s	5.67 it/s / 2.66s	14.44 it/s / 1.73s
INT8 (Dyn.) · w/o compile	1.38 it/s / 8.74s	3.32 it/s / 3.85s	10.00 it/s / 1.82s
INT8 (Dyn.) · w/ compile	1.38 it/s / 8.94s	4.69 it/s / 3.01s	13.59 it/s / 1.53s
Δ Stoch. · w/o compile	+20.14% / −12.95%	+16.90% / +19.02%	+28.11% / −11.95%
Δ Stoch. · w/ compile	+26.60% / −14.02%	+23.26% / +12.79%	+13.17% / −25.36%
Δ Dyn. · w/o compile	−4.17% / −15.46%	−6.48% / +6.10%	−2.06% / −14.47%
Δ Dyn. · w/ compile	−26.60% / −51.01%	+1.96% / +1.31%	+6.50% / −10.87%

Δ it/s = (INT8 − BF16) / BF16 × 100 · Δ Time = (BF16 − INT8) / BF16 × 100 · positive = INT8 faster

How to use

Cloning ComfyUI-INT8-Fast to custom_nodes directory.
Recommend to run ComfyUI with --disable-dynamic-vram option.
Use Load Diffusion Model INT8 (W8A8) node to model loading and set on_the_fly_qunatization to False (default).
Recommend to use "Stochastic" for lora.

Quantized layers

INT8Tensorwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

INT8Rowwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": [
            "blocks.0.", "blocks.27.", "adaln_modulation",
            ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"
    ]},
    { "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

Downloads last month: 6,607

Model tree for Bedovyy/Anima-INT8

Base model

circlestone-labs/Anima

Quantized

(18)

this model