Latent Adversarial Regularization for Offline Preference Optimization - Explained Simply | ArXiv Explained