VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling^1*, Chen Zhu^1*, Meiqi Wu^1,3*, Hangyu Li¹, Xiaokun Feng^1,2,
Cundian Yang¹, Aiming Hao¹, Jiashu Zhu¹, Jiahong Wu^1†, Xiangxiang Chu¹

(* equal contributions, † corresponding authors)

¹ AMAP, Alibaba Group ² CRISE, Institute of Automation, Chinese Academy of Sciences
³ School of Computer Science and Technology, University of Chinese Academy of Sciences

Abstract

Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce VMBench---a comprehensive Video Motion Benchmark that has perception-aligned motion metrics and features the most diverse types of motion. VMBench has several appealing properties: (1) Perception-Driven Motion Evaluation Metrics, we identify five dimensions based on human perception in motion video assessment and develop fine-grained evaluation metrics, providing deeper insights into models' strengths and weaknesses in motion quality. (2) Meta-Guided Motion Prompt Generation, a structured method that extracts meta-information, generates diverse motion prompts with LLMs, and refines them through human-AI validation, resulting in a multi-level prompt library covering six key dynamic scene dimensions. (3) Human-Aligned Validation Mechanism, we provide human preference annotations to validate our benchmarks, with our metrics achieving an average 35.3% improvement in Spearman’s correlation over baseline methods. This is the first time that the quality of motion in videos has been evaluated from the perspective of human perception alignment.

BibTeX

If you find our work useful, please consider citing our paper:

@misc{ling2025vmbenchbenchmarkperceptionalignedvideo, title={VMBench: A Benchmark for Perception-Aligned Video Motion Generation}, author={Xinran Ling and Chen Zhu and Meiqi Wu and Hangyu Li and Xiaokun Feng and Cundian Yang and Aiming Hao and Jiashu Zhu and Jiahong Wu and Xiangxiang Chu}, year={2025}, eprint={2503.10076}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.10076}, }

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Abstract

Perception-Driven Motion Evaluation Metrics (PMM)

Diagram of Human Perception Flow

Meta-Guided Motion Prompt Generation (MMPG)

VMBench Meta-Guided Motion Prompt Statistics

VMBench Generation Results of Open-Source Models

Prompt: A tourist joyfully splashes water in an outdoor swimming pool, their arms and legs moving energetically as they playfully splash around.

Prompt: Three books are thrown into the air, their pages fluttering as they soar over the soccer field, landing in a scattered pattern.

VMBench Evaluation Results of Video Generative Models

BibTeX