Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models
Abstract
Omni-WorldBench addresses the lack of comprehensive evaluation for interactive 4D world models by introducing a benchmark that assesses temporal dynamics and causal interaction effects across diverse scenarios.
Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench comprises two key components: Omni--WorldSuite, a systematic prompt suite spanning diverse interaction levels and scene types; and Omni--Metrics, an agent-based evaluation framework that quantifies world modeling capabilities by measuring the causal impact of interaction actions on both final outcomes and intermediate state evolution trajectories. We conduct extensive evaluations of 18 representative world models across multiple paradigms. Our analysis reveals critical limitations of current world models in interactive response, providing actionable insights for future research. Omni-WorldBench will be publicly released to foster progress in interactive 4D world modeling.
Community
It's helpful and timely to evaluate the progress of world model using the Omni-WorldBench.
good job
Good Job
Hope this work can help us build a better world model
Hoping to push the world model forward
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models (2026)
- DreamWorld: Unified World Modeling in Video Generation (2026)
- RISE-Video: Can Video Generators Decode Implicit World Rules? (2026)
- EgoSound: Benchmarking Sound Understanding in Egocentric Videos (2026)
- MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation (2026)
- GEBench: Benchmarking Image Generation Models as GUI Environments (2026)
- LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
It would be quite interesting to integrate something like this and SWITCH (https://huggingface.co/papers/2511.17649), which covers common real-world interaction scenarios not covered here and highlights additional critical limitations of current world models.
the move to long-horizon, action-conditioned dynamics via InterStab-L, InterStab-N, InterCov, and InterOrder is the most interesting bit here, not the glossy video quality. but because those metrics hinge on upstream extraction (segmentation, optical flow, camera motion), a biased or brittle estimator could skew the AgenticScore without actually reflecting true interactive fidelity. an ablation that replaces these proxies with ground-truth or with end-to-end differentiable surrogates would help sanity-check the causality claims. i also wonder how sensitive InterOrder is to temporal sampling rate; a model could game you by smoothing trajectories while keeping causal transitions the same. btw the arxivlens breakdown at https://arxivlens.com/PaperView/Details/omni-worldbench-towards-a-comprehensive-interaction-centric-evaluation-for-world-models-5268-d3ea6cce does a nice job unpacking section 3 and the aggregator design, helpful for folks trying to reproduce.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper