Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published • 115
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
• 2501.10120
• Published • 55
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
More Self-Confident Even When They Are Wrong
Paper
• 2501.09775
• Published • 32
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling
under Long-Context Scenario
Paper
• 2501.10132
• Published • 22
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial
Network for High-Fidelity Speech Super-Resolution
Paper
• 2501.10045
• Published • 10
X-Dyna: Expressive Dynamic Human Image Animation
Paper
• 2501.10021
• Published • 14
GameFactory: Creating New Games with Generative Interactive Videos
Paper
• 2501.08325
• Published • 67
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper
• 2501.09781
• Published • 27
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published • 100
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published • 34
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published • 41
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
• 2501.09732
• Published • 72
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
• 2501.09747
• Published • 29
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
• 2501.09751
• Published • 46
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published • 109
TokenVerse: Versatile Multi-concept Personalization in Token Modulation
Space
Paper
• 2501.12224
• Published • 48
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Paper
• 2501.12368
• Published • 45
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
• 2501.12202
• Published • 49
Reasoning Language Models: A Blueprint
Paper
• 2501.11223
• Published • 33
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Paper
• 2501.11733
• Published • 28
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic Environments
Paper
• 2501.10893
• Published • 26
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using
Real-Time Warped Noise
Paper
• 2501.08331
• Published • 20
Taming Teacher Forcing for Masked Autoregressive Video Generation
Paper
• 2501.12389
• Published • 10
The Geometry of Tokens in Internal Representations of Large Language
Models
Paper
• 2501.10573
• Published • 9
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of
Large Vision-Language Model
Paper
• 2501.12206
• Published • 4
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
• 2501.12895
• Published • 61
Autonomy-of-Experts Models
Paper
• 2501.13074
• Published • 44
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
• 2501.13007
• Published • 19
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
• 2501.12570
• Published • 28
Improving Video Generation with Human Feedback
Paper
• 2501.13918
• Published • 53
Temporal Preference Optimization for Long-Form Video Understanding
Paper
• 2501.13919
• Published • 23
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Paper
• 2501.10979
• Published • 6
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
• 2501.13200
• Published • 69
RL + Transformer = A General-Purpose Problem Solver
Paper
• 2501.14176
• Published • 28
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model
Critiques
Paper
• 2501.14492
• Published • 27
Chain-of-Retrieval Augmented Generation
Paper
• 2501.14342
• Published • 58
Paper
• 2501.14249
• Published • 77
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published • 31
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language
Model Born from Transformer
Paper
• 2501.15570
• Published • 25
Visual Generation Without Guidance
Paper
• 2501.15420
• Published • 8
Paper
• 2501.14912
• Published • 5
Return of the Encoder: Maximizing Parameter Efficiency for SLMs
Paper
• 2501.16273
• Published • 5
Large Concept Models: Language Modeling in a Sentence Representation
Space
Paper
• 2412.08821
• Published • 17
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
• 2501.16975
• Published • 32
Open Problems in Mechanistic Interpretability
Paper
• 2501.16496
• Published • 22
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published • 59
Trading Inference-Time Compute for Adversarial Robustness
Paper
• 2501.18841
• Published • 4
SAeUron: Interpretable Concept Unlearning in Diffusion Models with
Sparse Autoencoders
Paper
• 2501.18052
• Published • 8
The Surprising Agreement Between Convex Optimization Theory and
Learning-Rate Scheduling for Large Model Training
Paper
• 2501.18965
• Published • 7
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web
Navigation
Paper
• 2501.16609
• Published • 7
s1: Simple test-time scaling
Paper
• 2501.19393
• Published • 125
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published • 39
PixelWorld: Towards Perceiving Everything as Pixels
Paper
• 2501.19339
• Published • 17
Self-supervised Quantized Representation for Seamlessly Integrating
Knowledge Graphs with Large Language Models
Paper
• 2501.18119
• Published • 25
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
• 2411.04983
• Published • 13
Constitutional Classifiers: Defending against Universal Jailbreaks
across Thousands of Hours of Red Teaming
Paper
• 2501.18837
• Published • 10
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper
• 2501.18438
• Published • 23
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published • 25
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
• 2501.18511
• Published • 20
Streaming DiLoCo with overlapping communication: Towards a Distributed
Free Lunch
Paper
• 2501.18512
• Published • 29
Large Language Models Think Too Fast To Explore Effectively
Paper
• 2501.18009
• Published • 23
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published • 88
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published • 61
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published • 40
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal
Understanding
Paper
• 2502.01341
• Published • 39
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain
Procedural Sequence Generation
Paper
• 2502.01572
• Published • 21
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper
• 2502.01142
• Published • 25
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Paper
• 2502.01100
• Published • 21
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
• 2502.01081
• Published • 13
PhD Knowledge Not Required: A Reasoning Challenge for Large Language
Models
Paper
• 2502.01584
• Published • 9
Improving Transformer World Models for Data-Efficient RL
Paper
• 2502.01591
• Published • 10
Improved Training Technique for Latent Consistency Models
Paper
• 2502.01441
• Published • 8
Lifelong Sequential Knowledge Editing without Model Degradation
Paper
• 2502.01636
• Published • 4
Language Models Prefer What They Know: Relative Confidence Estimation
via Confidence Preferences
Paper
• 2502.01126
• Published • 4
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via
Critique-augmented Stepwise Information
Paper
• 2502.02095
• Published • 4
The Differences Between Direct Alignment Algorithms are a Blur
Paper
• 2502.01237
• Published • 113
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published • 62
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
• 2502.01718
• Published • 28
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
• 2502.02508
• Published • 22
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
• 2502.02584
• Published • 16
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for
Fine-Grained Understanding and Generation
Paper
• 2502.02589
• Published • 9
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling
Verification
Paper
• 2502.01839
• Published • 10
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published • 62
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published • 58
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 257
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
• 2502.02339
• Published • 23
On Teacher Hacking in Language Model Distillation
Paper
• 2502.02671
• Published • 18
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
• 2502.03275
• Published • 18
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
• 2502.01618
• Published • 10
Jailbreaking with Universal Multi-Prompts
Paper
• 2502.01154
• Published • 10
ConceptAttention: Diffusion Transformers Learn Highly Interpretable
Features
Paper
• 2502.04320
• Published • 36
Gold-medalist Performance in Solving Olympiad Geometry with
AlphaGeometry2
Paper
• 2502.03544
• Published • 44
Analyze Feature Flow to Enhance Interpretation and Steering in Language
Models
Paper
• 2502.03032
• Published • 60
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based
Speech Synthesis
Paper
• 2502.04128
• Published • 27
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
• 2502.03860
• Published • 25
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
Modality Alignment
Paper
• 2502.04328
• Published • 29
Weak-to-Strong Diffusion with Reflection
Paper
• 2502.00473
• Published • 24
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference
Optimization
Paper
• 2502.04306
• Published • 20
UltraIF: Advancing Instruction Following from the Wild
Paper
• 2502.04153
• Published • 24
PILAF: Optimal Human Preference Sampling for Reward Modeling
Paper
• 2502.04270
• Published • 12
Beyond Prompt Content: Enhancing LLM Performance via Content-Format
Integrated Prompt Optimization
Paper
• 2502.04295
• Published • 13
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154
Goku: Flow Based Video Generative Foundation Models
Paper
• 2502.04896
• Published • 107
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
• 2502.05173
• Published • 64
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
• 2502.04404
• Published • 25
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
• 2502.04728
• Published • 19
ARR: Question Answering with Large Language Models via Analyzing,
Retrieving, and Reasoning
Paper
• 2502.04689
• Published • 8
Value-Based Deep RL Scales Predictably
Paper
• 2502.04327
• Published • 7
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing
Multi-Objective Optimization based DPO for Text-to-Image Alignment
Paper
• 2502.03512
• Published • 5
SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in
LLMs
Paper
• 2502.02909
• Published • 2
Competitive Programming with Large Reasoning Models
Paper
• 2502.06807
• Published • 69
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published • 40
Teaching Language Models to Critique via Reinforcement Learning
Paper
• 2502.03492
• Published • 24
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn
More
Paper
• 2502.07490
• Published • 10
The Hidden Life of Tokens: Reducing Hallucination of Large
Vision-Language Models via Visual Information Steering
Paper
• 2502.03628
• Published • 12
History-Guided Video Diffusion
Paper
• 2502.06764
• Published • 12
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published • 153
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published • 58
Paper
• 2502.06049
• Published • 31
The Curse of Depth in Large Language Models
Paper
• 2502.05795
• Published • 40
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
• 2502.06772
• Published • 22
Distillation Scaling Laws
Paper
• 2502.08606
• Published • 47
LLM Pretraining with Continuous Concepts
Paper
• 2502.08524
• Published • 30
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to
Enhance RL Fine-Tuning
Paper
• 2502.06533
• Published • 17
DPO-Shift: Shifting the Distribution of Direct Preference Optimization
Paper
• 2502.07599
• Published • 15
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and
Uncertainty Based Routing
Paper
• 2502.04411
• Published • 4
Towards Trustworthy Retrieval Augmented Generation for Large Language
Models: A Survey
Paper
• 2502.06872
• Published • 8
Logical Reasoning in Large Language Models: A Survey
Paper
• 2502.09100
• Published • 24
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Paper
• 2502.09601
• Published • 14
Mathematical Reasoning in Large Language Models: Assessing Logical and
Arithmetic Errors across Wide Numerical Ranges
Paper
• 2502.08680
• Published • 11
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for
Reasoning Quality, Robustness, and Efficiency
Paper
• 2502.09621
• Published • 28
Large Language Diffusion Models
Paper
• 2502.09992
• Published • 127
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
• 2502.08235
• Published • 59
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
• 2502.10391
• Published • 34
Diverse Inference and Verification for Advanced Reasoning
Paper
• 2502.09955
• Published • 18
Precise Parameter Localization for Textual Generation in Diffusion
Models
Paper
• 2502.09935
• Published • 12
We Can't Understand AI Using our Existing Vocabulary
Paper
• 2502.07586
• Published • 11
CRANE: Reasoning with constrained LLM generation
Paper
• 2502.09061
• Published • 21
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Paper
• 2502.11157
• Published • 7
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
• 2502.10454
• Published • 7
Diffusion Models without Classifier-free Guidance
Paper
• 2502.12154
• Published • 8
Large Language Models and Mathematical Reasoning Failures
Paper
• 2502.11574
• Published • 3
Continuous Diffusion Model for Language Modeling
Paper
• 2502.11564
• Published • 53
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
• 2502.12215
• Published • 16
Training Language Models to Reason Efficiently
Paper
• 2502.04463
• Published • 1
Efficient Reasoning with Hidden Thinking
Paper
• 2501.19201
• Published • 2
Scalable Language Models with Posterior Inference of Latent Thought
Vectors
Paper
• 2502.01567
• Published • 2