Tags › #LLM 5 posts
-
Two Bets on Generative Recommendation: Semantic IDs vs. Fine-Tuned LLMs
A head-to-head comparison of the two paradigms remaking recommendation — semantic ID autoregressive models and fine-tuned LLMs — with trade-off analysis and a look at how they're converging.
-
The Attention Bottleneck: How Modern LLMs Solved a Problem That Nearly Broke the Transformer
From vanilla multi-head attention to Flash Attention 3 — the engineering bottlenecks that drove every major attention variant and the math behind each fix.
-
The Harness Is the Moat: Why Autonomous AI Agents Live or Die by Their Architecture
Model quality is commoditising. The durable competitive advantage in 2026 is harness architecture — the deterministic enclosures that make probabilistic agents reliable. A deep analysis of the four architectural primitives every production harness must implement, and how Autoresearch, Ralph Loop, Superpowers, and GSD each solve them differently.
-
From Vibe Coding to Harness Engineering: How to Actually Ship AI-Assisted Software
Vibe coding gets you a working prototype in 10 minutes. Harness engineering is how you ship it to production. Here's the difference, why it matters, and how to make the transition.
-
Why LLM Inference Costs Will Keep Falling
An analysis of hardware trends, algorithmic improvements, and market forces driving down the cost of running large language models.