Possibility of replacing base pretrained models for inference

#2
by jing-yi - opened

Hello!

I was reading the documentation for this model.

Under the hood, it uses https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 and https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

I was wondering..is it possible to replace them to smaller models during inference? For example, https://huggingface.co/segmind/Segmind-Vega and https://huggingface.co/openai/clip-vit-large-patch14.

MS-Diffusion's trainable adapters are built on SDXL and CLIP-G. They transform the CLIP image features into SDXL cross-attention tokens. A distilled SDXL can be used if it has the same cross-attention layers. However, since the output image features of CLIP-L and CLIP-G are different in shape, CLIP-G cannot be replaced by CLIP-L.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment