view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events +5 vinid, junlinw, zainhasan, shangzhu, coolcat21, clefourrier, jameszou • Jul 17, 2025 • 52
How to Train Your LLM Web Agent: A Statistical Diagnosis Paper • 2507.04103 • Published Jul 5, 2025 • 52
view article Article How to Train Your LLM Web Agent: A Statistical Diagnosis ppEmiliano • Jul 8, 2025 • 15
view article Article DABStep: Data Agent Benchmark for Multi-step Reasoning +5 eggie5, martinigoyanes, frisokingma, andreumora, lvwerra, thomwolf, m-ric • Feb 4, 2025 • 130
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks Paper • 2407.05291 • Published Jul 7, 2024 • 2