Louis Wang

ML engineer at Netflix, previously at Snap. I build reasoning recommender systems and AI agents — from generative retrieval and semantic IDs to autonomous agents, multi-agent systems, and LLM-powered applications.

Recent posts

April 25, 2026 · 9 min read

The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems

How a Texas Hold'em simulator became a blueprint for any domain where autonomous agents compete, negotiate, and adapt — turn by turn.
April 5, 2026 · 15 min read

Building a Self-Improving Personal Knowledge Base Powered by LLM

Inspired by Andrej Karpathy's post on LLM knowledge bases, I built a system where Claude Code skills manage a personal wiki end-to-end — ingesting raw content, compiling concept articles, synthesizing connections, and answering questions. You never touch the wiki. The LLM owns it.
April 2, 2026 · 13 min read

Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning

A technical deep-dive into Gemma 4's four core ideas — MatFormer elastic inference, hybrid attention with p-RoPE, parallel dense+MoE FFN, and native agentic tooling — with the Gemma 1–3 lineage as context.
April 2, 2026 · 10 min read

TurboQuant Explained: How Google Compresses KV Caches to 3 Bits Without Losing the Plot

A technical breakdown of Google Research's TurboQuant stack: why KV-cache quantization is really an inner-product estimation problem, how PolarQuant removes normalization overhead, and where QJL fits into the final system.
April 1, 2026 · 37 min read

Diffusion Language Models: How They Work, How They Compare to Autoregressive LLMs, and Where They're Going

A technical deep-dive into continuous and masked diffusion LLMs — full derivations, key models (LLaDA, Dream, Mercury), head-to-head comparison with autoregressive LLMs, and an honest look at whether dLLMs can replace AR in the future.

All posts →

Louis Wang

Recent posts

The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems

Building a Self-Improving Personal Knowledge Base Powered by LLM

Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning

TurboQuant Explained: How Google Compresses KV Caches to 3 Bits Without Losing the Plot

Diffusion Language Models: How They Work, How They Compare to Autoregressive LLMs, and Where They're Going