Diffusion Models: Difference between revisions
| Line 38: | Line 38: | ||
DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding. | DALL-E 2 adds a ''prior'' model which first converts a text embedding to a CLIP image embedding. | ||
Then the diffusion ''decoder'' generates an image based on the image embedding. | Then the diffusion ''decoder'' generates an image based on the image embedding. | ||
==Inversion== | |||
See [https://arxiv.org/abs/2105.05233 Diffusion Models Beat GANs on Image Synthesis].<br> | |||
Inversion of a diffusion model can be done by using DDIM for the reverse process.<br> | |||
This is done by using a variance of 0 for the sampling, hence making the reverse process (latent to image) deterministic. | |||
==Resources== | ==Resources== | ||
* [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | * [https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html Google AI Blog High Fidelity Image Generation Using Diffusion Models] - discusses SR3 and CDM | ||