Diffusion Models: Difference between revisions

Revision as of 17:25, 29 March 2022

Background

By Sohl-Dickstein et al.[1].

The goal is to define a mapping between a complex distribution \(\displaystyle q(\mathbf{x}^{(0)})\) (e.g. set of realistic images) to a simple distribution \(\displaystyle \pi(\mathbf{y})=p(\mathbf{x}^{(T)})\)(e.g. multivariate normal).
This is done by defining a forward trajectory \(\displaystyle q(\mathbf{x}^{(0...T)})\) and optimizing a reverse trajectory \(\displaystyle p(\mathbf{x}^{(0 ... T)})\).
The forward trajectory is repeatedly applying a Markov diffusion kernel (i.e. a function with a steady distribution \(\displaystyle \pi(\mathbf{y})\)), performing T steps of diffusion.
The reverse trajectory is again applying a diffusion kernel but with an estimated mean and variance.

Image Generation

DDPM

See DDPM paper

Here, the diffusion process is modeled as:

Forward: \(\displaystyle q(\mathbf{x}_t, \mathbf{x}_{t-1}) \sim N(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I})\)
Reverse: \(\displaystyle p_\theta(\mathbf{x}_{t-1}, \mathbf{t}) \sim N( \mu_\theta (x_t, t), \beta_t \mathbf{I})\)

The forward diffusion can be sampled for any \(\displaystyle t\) using:
\(\displaystyle \mathbf{x}_{t} = \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}\) where \(\displaystyle \bar\alpha_t = \prod_{s=1}^{t}(1-\beta{s})\)

The loss function is based on the mean of the posterior.
If we estimate \(\displaystyle \mu_\theta(x_t, t)\) as \(\displaystyle \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}} \boldsymbol{\epsilon}_\theta (\mathbf{x}_t, t) \right)\), then the loss function simplifies to:
\(\displaystyle E \left[ \frac{\beta^2_t}{2\sigma^2_t \alpha (1-\bar\alpha_t)} \Vert \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta( \sqrt{\bar\alpha_t} \mathbf{x}_0 - \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \Vert^2 \right]\)

Super-resolution

See SR3 iterative refinement
Here we use \(\displaystyle y\) to represent the sequence of priors and we condition on an extra input \(\displaystyle x\) which is the low-resolution image.

Resources

Google AI Blog High Fidelity Image Generation Using Diffusion Models - discusses SR3 and CDM

@@ Line 23: / Line 23: @@
 ===Super-resolution===
-See [https://iterative-refinement.github.io/ SR3 iterative refinement]
+See [https://iterative-refinement.github.io/ SR3 iterative refinement]<br>
+Here we use <math>y</math> to represent the sequence of priors and we condition on an extra input <math>x</math> which is the low-resolution image.
 ==Resources==
 * [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM