UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing - Explained Simply | ArXiv Explained