V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning - Explained Simply | ArXiv Explained