Diffusion Transformers with Representation Autoencoders - Explained Simply | ArXiv Explained