Thinking with Map:

Reinforced Parallel Map-Augmented Agent for Geolocalization


Yuxiang Ji1,2* Yong Wang2† Ziyu Ma2 Yiming Hu2 Hailang Huang2 Xuecai Hu2 Guanhua Chen3 Liaoni Wu1 Xiangxiang Chu2

1Xiamen University 2AMAP, Alibaba Group 3Southern University of Science and Technology
*Work done during internship at AMAP, Alibaba Group

Project lead

Thinking with Map Live Demo.

Abstract

The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans --- using maps. In this work, we first equip the model Thinking with Map ability and formulate it as an agent-in-the-map loop. We develop a two-stage optimization scheme for it, including agentic reinforcement learning (RL) followed by parallel test-time scaling (TTS). The RL strengthens the agentic capability of model to improve sampling efficiency, and the parallel TTS enables the model to explore multiple candidate paths before making the final prediction, which is crucial for geolocalization. To evaluate our method on up-to-date and in-the-wild images, we further present MAPBench, a comprehensive geolocalization training and evaluation benchmark composed entirely of real-world images. Experimental results show that our method outperforms existing open- and closed-source models on most metrics, specifically improving Acc@500m from 8.0% to 22.1% compared to Gemini-3-Pro with Google Search/Map grounded mode.

Thinking with Map Example

MY ALT TEXT
The illustration of a complete Thinking with Map process.

MY ALT TEXT

Through agentic reinforcement learning and parallelized test-time scaling, our method based on Qwen3-VL-30B-A3B achieves the best performance on most metrics over closed-source models.

Methodology

MY ALT TEXT
(a) The process of Thinking with Map, consists of an agent-in-the-map loop. During the loop, the agent implicitly maintains a candidate pool of hypotheses. (b) The agentic reinforcement learning. (c) The parallel test-time scaling with verifier pipeline.

Benchmark

MY ALT TEXT
We propose MAPBench, an up-to-date geolocalization benchmark with broad coverage across China. The dataset is categorized into two difficulty levels through a voting procedure involving GPT-o3, GPT-5 and Qwen3-VL-235B-A22B.

Experiments

MY ALT TEXT
The comparison on MAPBench. Results are reported as accuracy at six levels of granularity (Acc@Dis). The bold indicates the best.

BibTeX

@article{ji2026thinkingwithmap,
  title={Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization}, 
  author={Yuxiang Ji and Yong Wang and Ziyu Ma and Yiming Hu and Hailang Huang and Xuecai Hu and Guanhua Chen and Liaoni Wu and Xiangxiang Chu},
  journal={arXiv preprint arXiv:2601.05432},
  year={2026}
}