5,323
edits
(Created page with "Notes on the different versions of Stable Diffusion from what I can find online. ==Stable Diffusion 1== Stable diffusion consists of three main components * CLIP text encoder * VAE * UNet latent diffusion model The main difference between stable diffusion and other diffusion models is that the diffusion operations happens in a low-resolution latent space. For a 512x512 image, the latent may only be 64x64, a factor of 8 times smaller. This significantly reduces the comp...") |
|||
Line 28: | Line 28: | ||
==Stable Diffusion Turbo== | ==Stable Diffusion Turbo== | ||
[https://arxiv.org/abs/2311.17042 paper] | [https://arxiv.org/abs/2311.17042 paper] | ||
Released Nov 2023, [https://huggingface.co/stabilityai/sd-turbo SD-Turbo] and [https://huggingface.co/stabilityai/sdxl-turbo SDXL-Turbo] are fine-tuned versions of SD2 and SDXL trained using adversarial diffusion distillation (ADD). | |||
ADD applies fine-tuning using an adversarial loss (from GANs) and a score distillation loss (from DreamFusion) such that each iteration the model produces a complete image. This allows SD-Turbo to produce realistic images in a single iteration while preserving the ability to contine refining the images with additional diffusion iterations. | |||
==Stable Cascade== | ==Stable Cascade== |