Possibility of replacing base pretrained models for inference

by jing-yi - opened 15 days ago

15 days ago

Hello!

I was reading the documentation for this model.

Under the hood, it uses https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 and https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

I was wondering..is it possible to replace them to smaller models during inference? For example, https://huggingface.co/segmind/Segmind-Vega and https://huggingface.co/openai/clip-vit-large-patch14.

doge1516

Owner 11 days ago

MS-Diffusion's trainable adapters are built on SDXL and CLIP-G. They transform the CLIP image features into SDXL cross-attention tokens. A distilled SDXL can be used if it has the same cross-attention layers. However, since the output image features of CLIP-L and CLIP-G are different in shape, CLIP-G cannot be replaced by CLIP-L.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment