Archive

2026

May 19 Picking the Wrong Agent Topology Is Your Most Expensive Mistake
May 15 From Quadratic to Linear: A Survey of Subquadratic Sparse Attention
Apr 25 The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems
Apr 5 Building a Self-Improving Personal Knowledge Base Powered by LLM
Apr 2 Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning
Apr 2 TurboQuant Explained: How Google Compresses KV Caches to 3 Bits Without Losing the Plot
Apr 1 Diffusion Language Models: How They Work, How They Compare to Autoregressive LLMs, and Where They're Going
Mar 31 Multi-Agent Patterns: Swarm, Teammates, and the Coordinator
Mar 31 Designing for Extensibility: How Claude Code's Plugin and Skill System Works
Mar 31 Security Without a Sandbox: How Claude Code Decides What It's Allowed to Do
Mar 31 The Tool Use Loop: How Claude Code Executes Code, Edits Files, and Talks Back
Mar 31 Demystifying Claude Code: Inside the Architecture of a CLI Code Agent
Mar 29 Two Bets on Generative Recommendation: Semantic IDs vs. Fine-Tuned LLMs
Mar 28 The Attention Bottleneck: How Modern LLMs Solved a Problem That Nearly Broke the Transformer
Mar 25 The Harness Is the Moat: Why Autonomous AI Agents Live or Die by Their Architecture
Mar 23 Generative Recommendation in Production: HSTU, OneRec, and What Every Major Platform Is Building
Mar 23 From Vibe Coding to Harness Engineering: How to Actually Ship AI-Assisted Software
Mar 1 Why LLM Inference Costs Will Keep Falling