Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss - Explained Simply | ArXiv Explained