Tags › #transformers 2 posts

May 15, 2026 · 17 min read

From Quadratic to Linear: A Survey of Subquadratic Sparse Attention

Why standard attention breaks at 128K tokens, how four families of efficient attention tried and partially failed to fix it, and how content-dependent sparse routing achieves linear scaling without sacrificing retrieval accuracy.
March 28, 2026 · 16 min read

The Attention Bottleneck: How Modern LLMs Solved a Problem That Nearly Broke the Transformer

From vanilla multi-head attention to Flash Attention 3 — the engineering bottlenecks that drove every major attention variant and the math behind each fix.

From Quadratic to Linear: A Survey of Subquadratic Sparse Attention