On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification - Explained Simply

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification - Explained Simply | ArXiv Explained