VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents - Explained Simply | ArXiv Explained