Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation Paper • 2604.18168 • Published 1 day ago • 85
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 9 days ago • 69
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 11 days ago • 75
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 14 days ago • 184
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 16 days ago • 109
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 26 days ago • 155
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation Paper • 2603.12267 • Published Mar 12 • 13
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 74
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 177
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published Aug 27, 2025 • 32