The forward process — drown a picture in noise
Training starts with destruction. Take a clean image, add a little Gaussian noise, then a little more, until nothing is left but static. Drag the slider — you are the forward process. Every intermediate frame becomes a training example: “given this mess, what did the cleaner version look like?”
Left: your image at step t. Right: what the model must learn — the noise that was added.
The reverse process — order out of static
Generation runs the film backwards. Start from pure random points and repeatedly remove a little predicted noise. Watch 1,200 particles that begin as formless static get nudged, step by step, into a shape. Every step is small; the miracle is the accumulation.
Conceptual demo: the “denoiser” here knows the target distribution directly; a real model learns it from data. The choreography — noise → small steps → structure — is exactly the same.
The prompt is a steering wheel
Same starting noise, different destinations. A text prompt doesn't select a stored picture — it tilts every denoising step toward regions that match the description. Pick a “prompt” below and generate from the identical noise seed. The static is the same; the pull is different.
Steps vs. quality — why fast generation looks like mud
Every denoising step is a small correction. Give the process 60 steps and structure fully crystallizes; give it 3 and it never escapes the noise. Try each setting — the "avg distance from target" number is measured live from the particles.
Guidance — steering vs. over-steering
How hard should each step listen to the prompt? That knob is called guidance (CFG in image tools). One click renders three universes side by side, from the same seed: too weak, just right, and cranked to the max.