Abstract: This study aims to build a system that bridges the gap between robotics and environmental understanding by integrating various foundation models. While current visual-language models (VLMs) ...