Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning - Explained Simply | ArXiv Explained