ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? - Explained Simply | ArXiv Explained