SONAR: Sentence-Level Multimodal and Language-Agnostic Representations - Explained Simply | ArXiv Explained