Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following - Explained Simply | ArXiv Explained