Diffusion Models: Difference between revisions
| (6 intermediate revisions by the same user not shown) | |||
| Line 22: | Line 22: | ||
<math>E \left[ \frac{\beta^2_t}{2\sigma^2_t \alpha (1-\bar\alpha_t)} \Vert \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta( \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \Vert^2 \right]</math> | <math>E \left[ \frac{\beta^2_t}{2\sigma^2_t \alpha (1-\bar\alpha_t)} \Vert \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta( \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \Vert^2 \right]</math> | ||
===Super-resolution=== | ===Super-resolution and other Image-to-image generation=== | ||
See [https://iterative-refinement.github.io/ SR3 iterative refinement]<br> | See [https://iterative-refinement.github.io/ SR3 iterative refinement]<br> | ||
Here we use <math>\mathbf{y}</math> to represent the sequence of priors and we condition on an extra input <math>\mathbf{x}</math> which is the low-resolution image. | Here we use <math>\mathbf{y}</math> to represent the sequence of priors and we condition on an extra input <math>\mathbf{x}</math> which is the low-resolution image.<br> | ||
The neural network <math>f_{\theta}(\mathbf{x}, \mathbf{y}, \gamma)</math> continues to predict the added noise during training the reverse process. | |||
An unofficial PyTorch implementation of SR3 is available at [https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement]. | |||
In addition to SR3, the researchers at Google have also unveiled [https://iterative-refinement.github.io/palette/ Palette] which utilizes the same ideas to perform additional image operations such as colorization, uncropping, and inpainting. These tasks can be performed with a single model. | |||
===Text-to-image=== | |||
OpenAI have unveiled two text-to-image models, [https://github.com/openai/glide-text2im GLIDE] and [https://openai.com/dall-e-2/ DALL-E 2], which rely on diffusion models to generate images.<br> | |||
GLIDE has some open-source code which allows you to test a small version. | |||
At a high-level, GLIDE is a diffusion model which is conditioned on text embeddings and trained with a technique called classifier-free guidance.<br> | |||
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding. | |||
Then the diffusion ''decoder'' generates an image based on the image embedding. | |||
==Guided Diffusion== | |||
Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.<br> | |||
There are two types of guidance: classifier guidance and classifier-free guidance.<br> | |||
See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html]. | |||
Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.<br> | |||
Classifier-free guidance<ref name="ho2021classifierfree"/> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise. | |||
==Inversion== | |||
See [https://arxiv.org/abs/2105.05233 Diffusion Models Beat GANs on Image Synthesis].<br> | |||
Inversion of a diffusion model can be done by using DDIM for the reverse process.<br> | |||
This is done by using a variance of 0 for the sampling, hence making the reverse process (latent to image) deterministic. | |||
==Resources== | ==Resources== | ||
* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | * [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | ||
* https://theaisummer.com/diffusion-models/ | |||
==References== | |||
{{reflist|refs= | |||
<ref name="ho2021classifierfree">Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. doi:10.48550/ARXIV.2207.12598 https://arxiv.org/abs/2207.12598</ref> | |||
}} | |||