MOVA: Towards Scalable and Synchronized Video-Audio Generation - Explained Simply | ArXiv Explained