Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation - Explained Simply | ArXiv Explained