Geometry-Guided Reinforcement Learning
for Multi-view Consistent 3D Scene Editing

Jiyuan Wang^1,2,3, Chunyu Lin^1,✉, Lei Sun^2,✝, Zhi Cao¹, Yuyang Yin¹, Lang Nie⁴, Zhenlong Yuan², Xiangxiang Chu², Yunchao Wei¹, Kang Liao³, Guosheng Lin^3,✉

¹Beijing Jiaotong University ²AMap, Alibaba Group ³Nanyang Technological University ⁴Chongqing University of Posts and Telecommunications

^✉Corresponding author ^✝Project leader

arXiv Code 🤗 Model

RL3DEdit teaser figure showing diverse 3D editing results

RL3DEdit achieves high-quality 3D editing across diverse scenarios: motion edits, subject replacement, style transfer, background changes, and challenging scene additions — all in a single forward pass.

Abstract

Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, multi-view consistency remains challenging in edited results, and the extreme scarcity of paired 3D-consistent editing data makes supervised fine-tuning (SFT) impractical.

In this paper, we observe that, while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable, naturally positioning reinforcement learning (RL) as a feasible solution. Motivated by this, we propose RL3DEdit, a single-pass framework driven by RL optimization with novel rewards derived from the 3D foundation model, VGGT.

Specifically, we leverage VGGT's robust priors learned from massive real-world data, feed the edited images into it, and utilize the output confidence maps and pose estimation errors as reward signals, effectively anchoring the 2D editing priors onto a 3D-consistent manifold via RL. Extensive experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency.

Citation

@article{wang2026geometry,
  title={Geometry-Guided Reinforcement Learning for Multi-view
         Consistent 3D Scene Editing},
  author={Wang, Jiyuan and Lin, Chunyu and Sun, Lei and Cao, Zhi
          and Yin, Yuyang and Nie, Lang and Yuan, Zhenlong
          and Chu, Xiangxiang and Wei, Yunchao and Liao, Kang
          and others},
  journal={arXiv preprint arXiv:2603.03143},
  year={2026}
}

Geometry-Guided Reinforcement Learningfor Multi-view Consistent 3D Scene Editing

Abstract

Video

Citation

Geometry-Guided Reinforcement Learning
for Multi-view Consistent 3D Scene Editing