NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios Paper • 2503.19267 • Published Mar 25, 2025
Generalist Reward Models: Found Inside Large Language Models Paper • 2506.23235 • Published Jun 29, 2025
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Paper • 2606.17030 • Published 8 days ago • 28
Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models Paper • 2606.17846 • Published 6 days ago
Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System Paper • 2606.18112 • Published 5 days ago
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 190
Language Model Self-improvement by Reinforcement Learning Contemplation Paper • 2305.14483 • Published May 23, 2023 • 1
Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems Paper • 2305.04832 • Published May 3, 2023
Offline Reinforcement Learning with Causal Structured World Models Paper • 2206.01474 • Published Jun 3, 2022