Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision - Explained Simply | ArXiv Explained