SLA2: Sparse-Linear Attention with Learnable Routing and QAT - Explained Simply | ArXiv Explained