LLM - a xwc216 Collection

xwc216 's Collections

LLM

updated Feb 21, 2025

Upvote

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17, 2025 • 55
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17, 2025 • 22
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Paper • 2501.10045 • Published Jan 17, 2025 • 10
X-Dyna: Expressive Dynamic Human Image Animation

Paper • 2501.10021 • Published Jan 17, 2025 • 14
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14, 2025 • 67
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16, 2025 • 27
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13, 2025 • 100
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14, 2025 • 34
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16, 2025 • 41
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16, 2025 • 72
FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16, 2025 • 29
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published Jan 16, 2025 • 46
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20, 2025 • 109
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

Paper • 2501.12224 • Published Jan 21, 2025 • 48
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21, 2025 • 45
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Paper • 2501.12202 • Published Jan 21, 2025 • 49
Reasoning Language Models: A Blueprint

Paper • 2501.11223 • Published Jan 20, 2025 • 33
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Paper • 2501.11733 • Published Jan 20, 2025 • 28
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Paper • 2501.10893 • Published Jan 18, 2025 • 26
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Paper • 2501.08331 • Published Jan 14, 2025 • 20
Taming Teacher Forcing for Masked Autoregressive Video Generation

Paper • 2501.12389 • Published Jan 21, 2025 • 10
The Geometry of Tokens in Internal Representations of Large Language Models

Paper • 2501.10573 • Published Jan 17, 2025 • 9
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model

Paper • 2501.12206 • Published Jan 21, 2025 • 4
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22, 2025 • 61
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22, 2025 • 44
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

Paper • 2501.13007 • Published Jan 22, 2025 • 19
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Paper • 2501.12570 • Published Jan 22, 2025 • 28
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23, 2025 • 53
Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23, 2025 • 23
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18, 2025 • 15
Control LLM: Controlled Evolution for Intelligence Retention in LLM

Paper • 2501.10979 • Published Jan 19, 2025 • 6
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22, 2025 • 69
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Paper • 2501.14492 • Published Jan 24, 2025 • 27
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24, 2025 • 58
Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer

Paper • 2501.15570 • Published Jan 26, 2025 • 25
Visual Generation Without Guidance

Paper • 2501.15420 • Published Jan 26, 2025 • 8
Feasible Learning

Paper • 2501.14912 • Published Jan 24, 2025 • 5
Return of the Encoder: Maximizing Parameter Efficiency for SLMs

Paper • 2501.16273 • Published Jan 27, 2025 • 5
Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published Dec 11, 2024 • 17
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28, 2025 • 32
Open Problems in Mechanistic Interpretability

Paper • 2501.16496 • Published Jan 27, 2025 • 22
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published Jan 29, 2025 • 59
Trading Inference-Time Compute for Adversarial Robustness

Paper • 2501.18841 • Published Jan 31, 2025 • 4
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Paper • 2501.18052 • Published Jan 29, 2025 • 8
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

Paper • 2501.18965 • Published Jan 31, 2025 • 7
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Paper • 2501.16609 • Published Jan 28, 2025 • 7
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 125
Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31, 2025 • 39
PixelWorld: Towards Perceiving Everything as Pixels

Paper • 2501.19339 • Published Jan 31, 2025 • 17
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models

Paper • 2501.18119 • Published Jan 30, 2025 • 25
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

Paper • 2411.04983 • Published Nov 7, 2024 • 13
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Paper • 2501.18837 • Published Jan 31, 2025 • 10
o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published Jan 30, 2025 • 23
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30, 2025 • 25
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Paper • 2501.18511 • Published Jan 30, 2025 • 20
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published Jan 30, 2025 • 29
Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29, 2025 • 23
GuardReasoner: Towards Reasoning-based LLM Safeguards

Paper • 2501.18492 • Published Jan 30, 2025 • 88
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30, 2025 • 61
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3, 2025 • 39
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation

Paper • 2502.01572 • Published Feb 3, 2025 • 21
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published Feb 3, 2025 • 25
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3, 2025 • 13
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published Feb 3, 2025 • 9
Improving Transformer World Models for Data-Efficient RL

Paper • 2502.01591 • Published Feb 3, 2025 • 10
Improved Training Technique for Latent Consistency Models

Paper • 2502.01441 • Published Feb 3, 2025 • 8
Lifelong Sequential Knowledge Editing without Model Degradation

Paper • 2502.01636 • Published Feb 3, 2025 • 4
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences

Paper • 2502.01126 • Published Feb 3, 2025 • 4
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Paper • 2502.02095 • Published Feb 4, 2025 • 4
The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3, 2025 • 113
Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3, 2025 • 62
ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3, 2025 • 28
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Paper • 2502.02508 • Published Feb 4, 2025 • 22
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Paper • 2502.02584 • Published Feb 4, 2025 • 16
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

Paper • 2502.02589 • Published Feb 4, 2025 • 9
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

Paper • 2502.01839 • Published Feb 3, 2025 • 10
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 62
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 257
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4, 2025 • 18
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Paper • 2502.03275 • Published Feb 5, 2025 • 18
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Paper • 2502.01618 • Published Feb 3, 2025 • 10
Jailbreaking with Universal Multi-Prompts

Paper • 2502.01154 • Published Feb 3, 2025 • 10
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6, 2025 • 36
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published Feb 5, 2025 • 44
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6, 2025 • 27
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6, 2025 • 25
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6, 2025 • 29
Weak-to-Strong Diffusion with Reflection

Paper • 2502.00473 • Published Feb 1, 2025 • 24
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Paper • 2502.04306 • Published Feb 6, 2025 • 20
UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6, 2025 • 24
PILAF: Optimal Human Preference Sampling for Reward Modeling

Paper • 2502.04270 • Published Feb 6, 2025 • 12
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization

Paper • 2502.04295 • Published Feb 6, 2025 • 13
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 154
Goku: Flow Based Video Generative Foundation Models

Paper • 2502.04896 • Published Feb 7, 2025 • 107
VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7, 2025 • 64
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6, 2025 • 25
Generating Symbolic World Models via Test-time Scaling of Large Language Models

Paper • 2502.04728 • Published Feb 7, 2025 • 19
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7, 2025 • 8
Value-Based Deep RL Scales Predictably

Paper • 2502.04327 • Published Feb 6, 2025 • 7
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment

Paper • 2502.03512 • Published Feb 5, 2025 • 5
SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs

Paper • 2502.02909 • Published Feb 5, 2025 • 2
Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3, 2025 • 69
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11, 2025 • 40
Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published Feb 5, 2025 • 24
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published Feb 11, 2025 • 10
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Paper • 2502.03628 • Published Feb 5, 2025 • 12
History-Guided Video Diffusion

Paper • 2502.06764 • Published Feb 10, 2025 • 12
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 153
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published Feb 10, 2025 • 58
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9, 2025 • 31
The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9, 2025 • 40
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Paper • 2502.06772 • Published Feb 10, 2025 • 22
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 47
LLM Pretraining with Continuous Concepts

Paper • 2502.08524 • Published Feb 12, 2025 • 30
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning

Paper • 2502.06533 • Published Feb 10, 2025 • 17
DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Paper • 2502.07599 • Published Feb 11, 2025 • 15
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Paper • 2502.04411 • Published Feb 6, 2025 • 4
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey

Paper • 2502.06872 • Published Feb 8, 2025 • 8
Logical Reasoning in Large Language Models: A Survey

Paper • 2502.09100 • Published Feb 13, 2025 • 24
CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Paper • 2502.09601 • Published Feb 13, 2025 • 14
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges

Paper • 2502.08680 • Published Feb 12, 2025 • 11
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published Feb 13, 2025 • 28
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published Feb 12, 2025 • 59
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14, 2025 • 18
Precise Parameter Localization for Textual Generation in Diffusion Models

Paper • 2502.09935 • Published Feb 14, 2025 • 12
We Can't Understand AI Using our Existing Vocabulary

Paper • 2502.07586 • Published Feb 11, 2025 • 11
CRANE: Reasoning with constrained LLM generation

Paper • 2502.09061 • Published Feb 13, 2025 • 21
Dyve: Thinking Fast and Slow for Dynamic Process Verification

Paper • 2502.11157 • Published Feb 16, 2025 • 7
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

Paper • 2502.10454 • Published Feb 12, 2025 • 7
Diffusion Models without Classifier-free Guidance

Paper • 2502.12154 • Published Feb 17, 2025 • 8
Large Language Models and Mathematical Reasoning Failures

Paper • 2502.11574 • Published Feb 17, 2025 • 3
Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17, 2025 • 53
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17, 2025 • 16
Training Language Models to Reason Efficiently

Paper • 2502.04463 • Published Feb 6, 2025 • 1
Efficient Reasoning with Hidden Thinking

Paper • 2501.19201 • Published Jan 31, 2025 • 2
Scalable Language Models with Posterior Inference of Latent Thought Vectors

Paper • 2502.01567 • Published Feb 3, 2025 • 2

Upvote

Collection guide
Browse collections