Sparse Reward Subsystem in Large Language Models - Explained Simply | ArXiv Explained