Diffusion Models: Difference between revisions

Line 38: Line 38:
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding.
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding.
Then the diffusion ''decoder'' generates an image based on the image embedding.
Then the diffusion ''decoder'' generates an image based on the image embedding.
==Guided Diffusion==
Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.<br>
There are two types of guidance: classifier guidance and classifier-free guidance.<br>
See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html].
Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.<br>
Classifier-free guidance<ref name="ho2021classifierfree"> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise.


==Inversion==
==Inversion==