Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model - Explained Simply | ArXiv Explained