Diffusion Models: Difference between revisions
| (3 intermediate revisions by the same user not shown) | |||
| Line 38: | Line 38: | ||
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding. | DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding. | ||
Then the diffusion ''decoder'' generates an image based on the image embedding. | Then the diffusion ''decoder'' generates an image based on the image embedding. | ||
==Guided Diffusion== | |||
Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.<br> | |||
There are two types of guidance: classifier guidance and classifier-free guidance.<br> | |||
See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html]. | |||
Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.<br> | |||
Classifier-free guidance<ref name="ho2021classifierfree"/> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise. | |||
==Inversion== | ==Inversion== | ||
| Line 46: | Line 54: | ||
==Resources== | ==Resources== | ||
* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | * [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | ||
* https://theaisummer.com/diffusion-models/ | |||
==References== | |||
{{reflist|refs= | |||
<ref name="ho2021classifierfree">Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. doi:10.48550/ARXIV.2207.12598 https://arxiv.org/abs/2207.12598</ref> | |||
}} | |||