Agentic-ly agentic
updated
Automated Design of Agentic Systems
Paper
• 2408.08435
• Published • 40
On the limits of agency in agent-based models
Paper
• 2409.10568
• Published • 14
On the Diagram of Thought
Paper
• 2409.10038
• Published • 13
DSBench: How Far Are Data Science Agents to Becoming Data Science
Experts?
Paper
• 2409.07703
• Published • 66
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published • 48
Paper
• 2409.07429
• Published • 32
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized
Academic Assistance
Paper
• 2409.04593
• Published • 26
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
• 2409.17115
• Published • 64
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language
Models
Paper
• 2410.11710
• Published • 20