Agentic Entropy-Balanced Policy Optimization - Explained Simply | ArXiv Explained