Rethinking Cross-Layer Information Routing in Diffusion Transformers
Paper • 2605.20708 • Published • 94
None defined yet.
Rethinking Cross-Layer Information Routing in Diffusion Transformers
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps