Group Sequence Policy Optimization - Explained Simply | ArXiv Explained