Tags › #long context 1 post
-
From Quadratic to Linear: A Survey of Subquadratic Sparse Attention
Why standard attention breaks at 128K tokens, how four families of efficient attention tried and partially failed to fix it, and how content-dependent sparse routing achieves linear scaling without sacrificing retrieval accuracy.