Diffusion Models: Difference between revisions

(10 intermediate revisions by the same user not shown)

Line 8:

==Image Generation==

===Super-resolution===

===DDPM===

See [https://iterative-refinement.github.io/ SR3 iterative refinement]

See [https://arxiv.org/pdf/2006.11239.pdf DDPM paper]

Here, the diffusion process is modeled as:

* Forward: <math>q(\mathbf{x}_t, \mathbf{x}_{t-1}) \sim N(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I})</math>

* Reverse: <math>p_\theta(\mathbf{x}_{t-1}, \mathbf{t}) \sim N( \mu_\theta (x_t, t), \beta_t \mathbf{I})</math>

The forward diffusion can be sampled for any <math>t</math> using:

<math>\mathbf{x}_{t} = \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}</math> where <math>\bar\alpha_t = \prod_{s=1}^{t}(1-\beta{s})</math>

The loss function is based on the mean of the posterior.

If we estimate <math>\mu_\theta(x_t, t)</math> as <math>\frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}} \boldsymbol{\epsilon}_\theta (\mathbf{x}_t, t) \right)</math>, then the loss function simplifies to:

<math>E \left[ \frac{\beta^2_t}{2\sigma^2_t \alpha (1-\bar\alpha_t)} \Vert \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta( \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \Vert^2 \right]</math>

===Super-resolution and other Image-to-image generation===

See [https://iterative-refinement.github.io/ SR3 iterative refinement]

Here we use <math>\mathbf{y}</math> to represent the sequence of priors and we condition on an extra input <math>\mathbf{x}</math> which is the low-resolution image.

The neural network <math>f_{\theta}(\mathbf{x}, \mathbf{y}, \gamma)</math> continues to predict the added noise during training the reverse process.

An unofficial PyTorch implementation of SR3 is available at [https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement].

In addition to SR3, the researchers at Google have also unveiled [https://iterative-refinement.github.io/palette/ Palette] which utilizes the same ideas to perform additional image operations such as colorization, uncropping, and inpainting. These tasks can be performed with a single model.

===Text-to-image===

OpenAI have unveiled two text-to-image models, [https://github.com/openai/glide-text2im GLIDE] and [https://openai.com/dall-e-2/ DALL-E 2], which rely on diffusion models to generate images.

GLIDE has some open-source code which allows you to test a small version.

At a high-level, GLIDE is a diffusion model which is conditioned on text embeddings and trained with a technique called classifier-free guidance.

DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding.

Then the diffusion ''decoder'' generates an image based on the image embedding.

==Guided Diffusion==

Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.

There are two types of guidance: classifier guidance and classifier-free guidance.

See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html].

Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.

Classifier-free guidance<ref name="ho2021classifierfree"/> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise.

==Inversion==

See [https://arxiv.org/abs/2105.05233 Diffusion Models Beat GANs on Image Synthesis].

Inversion of a diffusion model can be done by using DDIM for the reverse process.

This is done by using a variance of 0 for the sampling, hence making the reverse process (latent to image) deterministic.

==Resources==

* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM

* https://theaisummer.com/diffusion-models/

==References==

{{reflist|refs=

<ref name="ho2021classifierfree">Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. doi:10.48550/ARXIV.2207.12598 https://arxiv.org/abs/2207.12598</ref>

}}

@@ Line 8: / Line 8: @@
 ==Image Generation==
-===Super-resolution===
+===DDPM===
-See [https://iterative-refinement.github.io/ SR3 iterative refinement]
+See [https://arxiv.org/pdf/2006.11239.pdf DDPM paper]<br>
+Here, the diffusion process is modeled as:
+* Forward: <math>q(\mathbf{x}_t, \mathbf{x}_{t-1}) \sim N(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I})</math>
+* Reverse: <math>p_\theta(\mathbf{x}_{t-1}, \mathbf{t}) \sim N( \mu_\theta (x_t, t), \beta_t \mathbf{I})</math>
+The forward diffusion can be sampled for any <math>t</math> using:<br>
+<math>\mathbf{x}_{t} = \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}</math> where <math>\bar\alpha_t = \prod_{s=1}^{t}(1-\beta{s})</math>
+The loss function is based on the mean of the posterior.<br>
+If we estimate <math>\mu_\theta(x_t, t)</math> as <math>\frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}} \boldsymbol{\epsilon}_\theta (\mathbf{x}_t, t) \right)</math>, then the loss function simplifies to:<br>
+<math>E \left[ \frac{\beta^2_t}{2\sigma^2_t \alpha (1-\bar\alpha_t)}  \Vert \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta( \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \Vert^2 \right]</math>
+===Super-resolution and other Image-to-image generation===
+See [https://iterative-refinement.github.io/ SR3 iterative refinement]<br>
+Here we use <math>\mathbf{y}</math> to represent the sequence of priors and we condition on an extra input <math>\mathbf{x}</math> which is the low-resolution image.<br>
+The neural network <math>f_{\theta}(\mathbf{x}, \mathbf{y}, \gamma)</math> continues to predict the added noise during training the reverse process.
+An unofficial PyTorch implementation of SR3 is available at [https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement].
+In addition to SR3, the researchers at Google have also unveiled [https://iterative-refinement.github.io/palette/ Palette] which utilizes the same ideas to perform additional image operations such as colorization, uncropping, and inpainting. These tasks can be performed with a single model.
+===Text-to-image===
+OpenAI have unveiled two text-to-image models, [https://github.com/openai/glide-text2im GLIDE] and [https://openai.com/dall-e-2/ DALL-E 2], which rely on diffusion models to generate images.<br>
+GLIDE has some open-source code which allows you to test a small version.
+At a high-level, GLIDE is a diffusion model which is conditioned on text embeddings and trained with a technique called classifier-free guidance.<br>
+DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding.
+Then the diffusion ''decoder'' generates an image based on the image embedding.
+==Guided Diffusion==
+Guidance is a method used to push the diffusion process towards the input condition, e.g. the text input.<br>
+There are two types of guidance: classifier guidance and classifier-free guidance.<br>
+See [https://benanne.github.io/2022/05/26/guidance.html https://benanne.github.io/2022/05/26/guidance.html].
+Classifier guidance uses an image classifier (e.g. clip) to update the noisy input images towards the desired class.<br>
+Classifier-free guidance<ref name="ho2021classifierfree"/> performs inference on the diffusion model to predict the noise with and without the class input, and extrapolating away from the output without noise.
+==Inversion==
+See [https://arxiv.org/abs/2105.05233 Diffusion Models Beat GANs on Image Synthesis].<br>
+Inversion of a diffusion model can be done by using DDIM for the reverse process.<br>
+This is done by using a variance of 0 for the sampling, hence making the reverse process (latent to image) deterministic.
 ==Resources==
 * [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM
+* https://theaisummer.com/diffusion-models/
+==References==
+{{reflist|refs=
+<ref name="ho2021classifierfree">Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. doi:10.48550/ARXIV.2207.12598 https://arxiv.org/abs/2207.12598</ref>
+}}