World Models, Playable

EP 01

Imagination — rehearsing futures that never happen

This agent (●) wants the goal (★). Before moving a muscle, it runs candidate futures through its internal model — the faint ghost paths are literally its imagination. Click the grid to add/remove walls and watch it re-dream instantly. Then let it act.

Ghost trails = imagined rollouts (brighter = judged better by the model). Solid trail = the one action sequence that actually gets executed.

The point: the agent “experienced” dozens of futures and paid for none of them. Imagination is cheap; reality is expensive. That asymmetry is the entire business case for world models.

EP 02

Surprise — when reality disagrees with the dream

The agent's model predicts this ball's flight — the dotted line. Now sabotage it: switch on a hidden wind the model doesn't know about. Prediction and reality split apart, and the gap between them — prediction error — is the red meter. That error signal is precisely what the model learns from.

model believes: no wind

After a windy flight, click “Update model” — the model absorbs the error, and its next prediction accounts for wind.

Surprise is the teacher. A world model isn't trained by being told the truth — it's trained by being wrong, measuring exactly how wrong, and adjusting. Babies do this with gravity; robots do it with gradient descent.

EP 03

The race — trial-and-error vs. thinking ahead

Two agents, identical maze, same goal. Gray is model-free: it only learns by bumping into things, step after costly step. Red carries a world model: it plans the route internally first, then walks it. Count the steps.

Model-free agent uses random exploration with wall-memory (a crude Q-learner's childhood). Planner runs breadth-first search inside its model, then executes.

Why this matters now: LLM agents that “think step by step” are inching toward this — simulating consequences in text before acting. The bet behind world-model research (LeCun, DeepMind, and others) is that real planning needs a real internal simulator, not just next-word reflexes.

EP 04

The stale map — a perfect plan for a world that changed

A world model is only as good as its last update. Here the agent plans a flawless route — but the world has quietly changed (the semi-transparent wall is real; the agent's map doesn't have it). Watch it march confidently into the wall, get surprised, patch its map, and replan.

✗ bad case✓ good case

The lesson robots learn the hard way: imagination is only as trustworthy as the map behind it. Good agents treat every plan as a hypothesis and every collision as free training data. Bad agents keep replaying a beautiful plan for a world that no longer exists.

EP 05

The horizon problem — tiny errors compound

This agent's physics model is almost perfect — gravity is off by just 6%. Drag the horizon slider: predicting a few steps ahead, the error is invisible; predicting far ahead, the dream and reality end up in different places entirely.

✓ good case✗ bad case

Prediction horizon 25 steps

Solid = reality. Dashed = the model's imagination with a 6% gravity error.

Why real systems replan constantly: self-driving stacks and MPC controllers imagine only a short window ahead, act, then re-imagine — because your dream of step 100 inherits every error from steps 1–99. Short dreams, frequent updates.