GLM 5.2: long-horizon coding at a million tokens
2026-06-23Z.ai's GLM 5.2 is a 744B/40B-active open-weights MoE with a real 1M-token context, built for long-horizon agentic coding. How IndexShare makes that context cheap, what changed in training, and where it lands against the frontier — with the benchmarks.
7 min · llm · glm · long-context · agentic-coding · explainer
Sakana Fugu: a multi-agent system as a model
2026-06-23Sakana AI turned LLM orchestration into a single model. A walk through the two ICLR 2026 papers behind Fugu — TRINITY, an evolved sub-20K-parameter coordinator, and the Conductor, a 7B reinforcement-learned orchestrator — and how routing a pool of frontier models beats any one of them.
12 min · llm · multi-agent · orchestration · reinforcement-learning · explainer
Mixture of Experts, from scratch
2026-06-10Why MoE lets a model carry billions of parameters but only pay for a slice of them per token — built up from one MLP, a router, and a sparse forward pass, with the gating, dispatch, and load-balancing made visible.
18 min · deep-learning · transformers · mixture-of-experts · explainer
Coroutines in C, intuitively
2026-06-09How to pause a function in the middle and resume it later — using nothing but a switch statement and __LINE__. An intuitive tour of Simon Tatham's classic trick, with a step-through animation.
7 min · c · coroutines · systems · explainer
How self-attention works in transformers
2026-06-02A from-scratch explainer of scaled dot-product attention — queries, keys, values, the softmax, and why the √d scaling matters.
3 min · transformers · deep-learning · explainer