Towards deployment-centric multimodal AI beyond vision and language - Explained Simply | ArXiv Explained