Louis Wang
ML engineer at Netflix, previously at Snap. I build reasoning recommender systems and AI agents — from generative retrieval and semantic IDs to autonomous agents, multi-agent systems, and LLM-powered applications.
Recent posts
-
The Agent Harness Pattern: What Poker Taught Me About Multi-Agent Systems
How a Texas Hold'em simulator became a blueprint for any domain where autonomous agents compete, negotiate, and adapt — turn by turn.
-
Building a Self-Improving Personal Knowledge Base Powered by LLM
Inspired by Andrej Karpathy's post on LLM knowledge bases, I built a system where Claude Code skills manage a personal wiki end-to-end — ingesting raw content, compiling concept articles, synthesizing connections, and answering questions. You never touch the wiki. The LLM owns it.
-
Gemma 4 Explained: How One Model Family Spans Phones and Frontier-Class Reasoning
A technical deep-dive into Gemma 4's four core ideas — MatFormer elastic inference, hybrid attention with p-RoPE, parallel dense+MoE FFN, and native agentic tooling — with the Gemma 1–3 lineage as context.
-
TurboQuant Explained: How Google Compresses KV Caches to 3 Bits Without Losing the Plot
A technical breakdown of Google Research's TurboQuant stack: why KV-cache quantization is really an inner-product estimation problem, how PolarQuant removes normalization overhead, and where QJL fits into the final system.
-
Diffusion Language Models: How They Work, How They Compare to Autoregressive LLMs, and Where They're Going
A technical deep-dive into continuous and masked diffusion LLMs — full derivations, key models (LLaDA, Dream, Mercury), head-to-head comparison with autoregressive LLMs, and an honest look at whether dLLMs can replace AR in the future.