Training Compute-Optimal Large Language Models - Explained Simply | ArXiv Explained