Diffusion Models, Playable

EP 01

The forward process — drown a picture in noise

Training starts with destruction. Take a clean image, add a little Gaussian noise, then a little more, until nothing is left but static. Drag the slider — you are the forward process. Every intermediate frame becomes a training example: “given this mess, what did the cleaner version look like?”

Noise step t t = 0 (clean)

Left: your image at step t. Right: what the model must learn — the noise that was added.

The trick: destroying is easy and perfectly known. So the model never has to “learn to paint” — it only learns to answer “what noise was just added?”, a much easier question, millions of times.

EP 02

The reverse process — order out of static

Generation runs the film backwards. Start from pure random points and repeatedly remove a little predicted noise. Watch 1,200 particles that begin as formless static get nudged, step by step, into a shape. Every step is small; the miracle is the accumulation.

step 0 / 60

Conceptual demo: the “denoiser” here knows the target distribution directly; a real model learns it from data. The choreography — noise → small steps → structure — is exactly the same.

Key intuition: no single step creates the image. Each step only makes the cloud slightly less improbable. Sixty tiny corrections later, improbability has nowhere left to hide.

EP 03

The prompt is a steering wheel

Same starting noise, different destinations. A text prompt doesn't select a stored picture — it tilts every denoising step toward regions that match the description. Pick a “prompt” below and generate from the identical noise seed. The static is the same; the pull is different.

seed #7 · pick a prompt

Why “same seed, same vibe”: artists reuse seeds because the seed fixes the noise, and the noise fixes the composition's skeleton. The prompt then decides what that skeleton becomes. You just did the same thing.

EP 04

Steps vs. quality — why fast generation looks like mud

Every denoising step is a small correction. Give the process 60 steps and structure fully crystallizes; give it 3 and it never escapes the noise. Try each setting — the "avg distance from target" number is measured live from the particles.

✓ good case✗ bad case

Denoising steps

The real-world tradeoff: more steps = better image but slower and pricier. The entire research race around "fast samplers" and "distilled one-step models" is an attempt to cheat this exact curve you just felt.

EP 05

Guidance — steering vs. over-steering

How hard should each step listen to the prompt? That knob is called guidance (CFG in image tools). One click renders three universes side by side, from the same seed: too weak, just right, and cranked to the max.

✗ bad case✓ good case✗ bad case

Recognize this in the wild: low guidance = dreamy blobs that ignore your prompt; extreme guidance = the fried, oversaturated, weirdly-identical look. When artists tune "CFG 7 vs CFG 20", they are riding exactly this slider.