Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training - Explained Simply | ArXiv Explained