Green-VLA: Staged Vision-Language-Action Model for Generalist Robots - Explained Simply | ArXiv Explained