EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control - Explained Simply | ArXiv Explained