Possibility of replacing base pretrained models for inference
#2
by
jing-yi
- opened
Hello!
I was reading the documentation for this model.
Under the hood, it uses https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 and https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.
I was wondering..is it possible to replace them to smaller models during inference? For example, https://huggingface.co/segmind/Segmind-Vega and https://huggingface.co/openai/clip-vit-large-patch14.
MS-Diffusion's trainable adapters are built on SDXL and CLIP-G. They transform the CLIP image features into SDXL cross-attention tokens. A distilled SDXL can be used if it has the same cross-attention layers. However, since the output image features of CLIP-L and CLIP-G are different in shape, CLIP-G cannot be replaced by CLIP-L.