SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search Paper • 2512.23167 • Published Dec 29, 2025 • 1
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 Feb 12 • 31
Enterprise Agents and Benchmarks Collection Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 10 items • Updated Feb 15 • 14
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
Enterprise Agents and Benchmarks Collection Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 10 items • Updated Feb 15 • 14