Spaces:
Runtime error
Runtime error
File size: 1,882 Bytes
a9b3bf8 9137b78 a9b3bf8 327e673 a9b3bf8 5c762ce 327e673 f99066d 18d363a 9137b78 18d363a f99066d 9137b78 f99066d 9137b78 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
This demo showcases a lightweight Stable Diffusion model (SDM) for general-purpose text-to-image synthesis. Our model **BK-SDM-Small** achieves **36% reduced** parameters and latency. This model is bulit with (i) removing several residual and attention blocks from the U-Net of SDM-v1.4 and (ii) distillation pretraining on only 0.22M LAION pairs (fewer than 0.1% of the full training set). Despite very limited training resources, our model can imitate the original SDM by benefiting from transferred knowledge.
Our compressed model accelerates inference speed while preserving visually compelling results.
<center>
<img alt="U-Net architectures and KD-based pretraining" img src="https://huggingface.co/spaces/nota-ai/theme/resolve/3bb3eed8b911d0baf306767bb9548bf732052c53/docs/compressed_stable_diffusion/fig_model.png" width="65%">
</center>
<br/>
### Notice
- This research was accepted to
- [**ICML 2023 Workshop on Efficient Systems for Foundation Models** (ES-FoMo)](https://es-fomo.com/)
- [**ICCV 2023 Demo Track**](https://iccv2023.thecvf.com/)
- Please be aware that your prompts are logged (_without_ any personally identifiable information).
- For different images with the same prompt, please change _Random Seed_ in Advanced Settings (because of using the firstly sampled latent code per seed).
- Some demo codes were borrowed from the repo of Stability AI ([stabilityai/stable-diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion)) and AK ([akhaliq/small-stable-diffusion-v0](https://huggingface.co/spaces/akhaliq/small-stable-diffusion-v0)). Thanks!
### Compute environment for the demo
- [June/30/2023] **Free CPU-basic** (2 vCPU · 16 GB RAM) — quite slow inference.
- [May/31/2023] **T4-small** (4 vCPU · 15 GB RAM · 16GB VRAM) — 5~10 sec for the original model to generate a 512×512 image with 25 denoising steps.
|