Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning - Explained Simply | ArXiv Explained