Tags › #long context 1 post

May 15, 2026 · 17 min read

From Quadratic to Linear: A Survey of Subquadratic Sparse Attention

Why standard attention breaks at 128K tokens, how four families of efficient attention tried and partially failed to fix it, and how content-dependent sparse routing achieves linear scaling without sacrificing retrieval accuracy.

From Quadratic to Linear: A Survey of Subquadratic Sparse Attention