Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR Paper • 2605.10781 • Published 20 days ago • 17
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs Paper • 2604.16054 • Published Apr 17 • 1
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation Paper • 2603.25732 • Published Mar 26 • 11
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 57
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty Paper • 2603.15500 • Published Mar 16 • 12
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems Paper • 2603.07779 • Published Mar 8 • 5
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models Paper • 2603.07777 • Published Mar 8 • 5
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity Paper • 2603.05168 • Published Mar 5 • 6
Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces Paper • 2603.06713 • Published Mar 5 • 16
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Paper • 2603.03205 • Published Mar 3 • 13
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published Feb 3 • 31
Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability Paper • 2602.02477 • Published Feb 2 • 11