Diffusion Models: Difference between revisions

 
(5 intermediate revisions by the same user not shown)
Line 34: Line 34:
OpenAI have unveiled two text-to-image models, [https://github.com/openai/glide-text2im GLIDE] and [https://openai.com/dall-e-2/ DALL-E 2], which rely on diffusion models to generate images.<br>
OpenAI have unveiled two text-to-image models, [https://github.com/openai/glide-text2im GLIDE] and [https://openai.com/dall-e-2/ DALL-E 2], which rely on diffusion models to generate images.<br>
GLIDE has some open-source code which allows you to test a small version.
GLIDE has some open-source code which allows you to test a small version.
At a high-level, GLIDE is a diffusion model which is conditioned on text embeddings and trained with a technique called classifier-free guidance.<br>
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding.
Then the diffusion ''decoder'' generates an image based on the image embedding.
==Guided Diffusion==
Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.<br>
There are two types of guidance: classifier guidance and classifier-free guidance.<br>
See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html].
Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.<br>
Classifier-free guidance<ref name="ho2021classifierfree"/> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise.
==Inversion==
See [https://arxiv.org/abs/2105.05233 Diffusion Models Beat GANs on Image Synthesis].<br>
Inversion of a diffusion model can be done by using DDIM for the reverse process.<br>
This is done by using a variance of 0 for the sampling, hence making the reverse process (latent to image) deterministic.


==Resources==
==Resources==
* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM
* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM
* https://theaisummer.com/diffusion-models/
==References==
{{reflist|refs=
<ref name="ho2021classifierfree">Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. doi:10.48550/ARXIV.2207.12598 https://arxiv.org/abs/2207.12598</ref>
}}