Archive
2026
- Picking the Wrong Agent Topology Is Your Most Expensive Mistake
- From Quadratic to Linear: A Survey of Subquadratic Sparse Attention
- The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems
- Building a Self-Improving Personal Knowledge Base Powered by LLM
- Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning
- TurboQuant Explained: How Google Compresses KV Caches to 3 Bits Without Losing the Plot
- Diffusion Language Models: How They Work, How They Compare to Autoregressive LLMs, and Where They're Going
- Multi-Agent Patterns: Swarm, Teammates, and the Coordinator
- Designing for Extensibility: How Claude Code's Plugin and Skill System Works
- Security Without a Sandbox: How Claude Code Decides What It's Allowed to Do
- The Tool Use Loop: How Claude Code Executes Code, Edits Files, and Talks Back
- Demystifying Claude Code: Inside the Architecture of a CLI Code Agent
- Two Bets on Generative Recommendation: Semantic IDs vs. Fine-Tuned LLMs
- The Attention Bottleneck: How Modern LLMs Solved a Problem That Nearly Broke the Transformer
- The Harness Is the Moat: Why Autonomous AI Agents Live or Die by Their Architecture
- Generative Recommendation in Production: HSTU, OneRec, and What Every Major Platform Is Building
- From Vibe Coding to Harness Engineering: How to Actually Ship AI-Assisted Software
- Why LLM Inference Costs Will Keep Falling