Thinking with Map: Reinforced Parallel Map-Augemented Agent for Geolocalization

Abstract

The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans --- using maps. In this work, we first equip the model Thinking with Map ability and formulate it as an agent-in-the-map loop. We develop a two-stage optimization scheme for it, including agentic reinforcement learning (RL) followed by parallel test-time scaling (TTS). The RL strengthens the agentic capability of model to improve sampling efficiency, and the parallel TTS enables the model to explore multiple candidate paths before making the final prediction, which is crucial for geolocalization. To evaluate our method on up-to-date and in-the-wild images, we further present MAPBench, a comprehensive geolocalization training and evaluation benchmark composed entirely of real-world images. Experimental results show that our method outperforms existing open- and closed-source models on most metrics, specifically improving Acc@500m from 8.0% to 22.1% compared to Gemini-3-Pro with Google Search/Map grounded mode.

Thinking with Map Example

The illustration of a complete Thinking with Map process.

Through agentic reinforcement learning and parallelized test-time scaling, our method based on Qwen3-VL-30B-A3B achieves the best performance on most metrics over closed-source models.

Methodology

(a) The process of Thinking with Map, consists of an agent-in-the-map loop. During the loop, the agent implicitly maintains a candidate pool of hypotheses. (b) The agentic reinforcement learning. (c) The parallel test-time scaling with verifier pipeline.

Benchmark

We propose MAPBench, an up-to-date geolocalization benchmark with broad coverage across China. The dataset is categorized into two difficulty levels through a voting procedure involving GPT-o3, GPT-5 and Qwen3-VL-235B-A22B.

Experiments

The comparison on MAPBench. Results are reported as accuracy at six levels of granularity (Acc@Dis). The bold indicates the best.

BibTeX

@article{ji2026thinking,
  title={Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization},
  author={Ji, Yuxiang and Wang, Yong and Ma, Ziyu and Hu, Yiming and Huang, Hailang and Hu, Xuecai and Chen, Guanhua and Wu, Liaoni and Chu, Xiangxiang},
  journal={arXiv preprint arXiv:2601.05432},
  year={2026}
}

Thinking with Map:

Reinforced Parallel Map-Augmented Agent for Geolocalization

Thinking with Map Live Demo.