Spatial Heatmaps
Danger and affinity maps preserve local hazards, promising regions, and topological structure in the visual modality instead of flattening layouts into text.
Paper Abstract
Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatial decision making: geometric priors are compressed into lossy language, and sparse interaction is often supervised through delayed textual feedback rather than dense visually grounded signals.
AtlasVA is a teacher-free visual skill memory framework that organizes memory into three complementary layers: spatial heatmaps, visual exemplars, and symbolic text skills. It evolves danger and affinity atlases directly from trajectory statistics and lightweight grid heuristics, then reuses these self-evolving atlases as potential-based shaping rewards for reinforcement learning.
Core Ideas
Danger and affinity maps preserve local hazards, promising regions, and topological structure in the visual modality instead of flattening layouts into text.
Representative success and failure screenshots provide concrete visual references that help the policy avoid repeated mistakes across episodes.
Trajectory statistics and lightweight grid heuristics update the atlases with EMA blending, removing the need for external LLM teachers.
The evolved atlases become potential functions that reward motion toward high-affinity regions and penalize historically risky coordinates.
Architecture
Task Demos
Representative agent rollouts across grid puzzles, navigation, and manipulation tasks.
Reported Experiments
Citation
@misc{wang2026atlasvaselfevolvingvisualskill,
title={AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents},
author={Pan Wang and Yihao Hu and Xiujin Liu and Jingchu Yang and Hang Wang and Zhihao Wen},
year={2026},
eprint={2605.17933},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.17933},
}