Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts - Explained Simply | ArXiv Explained