SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Paper • 2410.03859 • Published Oct 4, 2024 • 1
VideoGameBench: Can Vision-Language Models complete popular video games? Paper • 2505.18134 • Published May 23, 2025 • 6
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper • 2501.02045 • Published Jan 3, 2025 • 22
INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning Paper • 2505.07291 • Published May 12, 2025 • 15
TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference Paper • 2501.16007 • Published Jan 27, 2025 • 1