Tags › #LLM 9 posts

April 25, 2026 · 9 min read

The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems

How a Texas Hold'em simulator became a blueprint for any domain where autonomous agents compete, negotiate, and adapt — turn by turn.
April 5, 2026 · 15 min read

Building a Self-Improving Personal Knowledge Base Powered by LLM

Inspired by Andrej Karpathy's post on LLM knowledge bases, I built a system where Claude Code skills manage a personal wiki end-to-end — ingesting raw content, compiling concept articles, synthesizing connections, and answering questions. You never touch the wiki. The LLM owns it.
April 2, 2026 · 13 min read

Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning

A technical deep-dive into Gemma 4's four core ideas — MatFormer elastic inference, hybrid attention with p-RoPE, parallel dense+MoE FFN, and native agentic tooling — with the Gemma 1–3 lineage as context.
March 31, 2026 · 8 min read

The Tool Use Loop: How Claude Code Executes Code, Edits Files, and Talks Back

A tool call is a structured JSON request from the LLM to run a named function. Here's exactly how Claude Code handles the full lifecycle — from API call to file edit to loop continuation.
March 29, 2026 · 17 min read

Two Bets on Generative Recommendation: Semantic IDs vs. Fine-Tuned LLMs

A head-to-head comparison of the two paradigms remaking recommendation — semantic ID autoregressive models and fine-tuned LLMs — with trade-off analysis and a look at how they're converging.
March 28, 2026 · 16 min read

The Attention Bottleneck: How Modern LLMs Solved a Problem That Nearly Broke the Transformer

From vanilla multi-head attention to Flash Attention 3 — the engineering bottlenecks that drove every major attention variant and the math behind each fix.
March 25, 2026 · 22 min read

The Harness Is the Moat: Why Autonomous AI Agents Live or Die by Their Architecture

Model quality is commoditising. The durable competitive advantage in 2026 is harness architecture — the deterministic enclosures that make probabilistic agents reliable. A deep analysis of the four architectural primitives every production harness must implement, and how Autoresearch, Ralph Loop, Superpowers, and GSD each solve them differently.
March 23, 2026 · 10 min read

From Vibe Coding to Harness Engineering: How to Actually Ship AI-Assisted Software

Vibe coding gets you a working prototype in 10 minutes. Harness engineering is how you ship it to production. Here's the difference, why it matters, and how to make the transition.
March 1, 2026 · 3 min read

Why LLM Inference Costs Will Keep Falling

An analysis of hardware trends, algorithmic improvements, and market forces driving down the cost of running large language models.