Multimodal Models
Collection
2 items
•
Updated
基于 StableDiffusion 1.5 LCM 项目,展示该项目 文生图、图生图 在基于 AX650N 的产品上部署的流程。
支持芯片:
支持硬件
原始模型请参考
Models | Raspberry Pi5 Only CPU | Intel i7-13700 | Raspberry Pi5 + M.2 Card |
---|---|---|---|
UNet(1 step) | 14 s | 1.7 s | 0.43 s |
VAE Encoder | 25 s | 1.7 s | 0.46 s |
VAE Decoder | 58 s | 3.8 s | 0.91 s |
Total txt2img, 4 steps | 120 s | 10.6 s | 2.68 s |
Total img2img, 2 steps | 113 s | 8.9 s | 2.25 s |
unet.axmodel
, vae_encoder.axmodel
, vae_decoder
模型拷贝到 ./models
路径下Dreamshaper 7
仓库中的 text_encoder
文件夹拷贝到 ./models
路径下./models
存放了 DEMO 展示的必要模型miniconda
pip install -r requirements.txt
run_txt2img_axe_infer.py
Input Prompt
Self-portrait oil painting, a beautiful cyborg with golden hair, 8k
Output
(sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ python run_txt2img_axe_infer.py
[INFO] Available providers: ['AXCLRTExecutionProvider']
prompt: Self-portrait oil painting, a beautiful cyborg with golden hair, 8k
text_tokenizer: ./models/tokenizer
text_encoder: ./models/text_encoder
unet_model: ./models/unet.axmodel
vae_decoder_model: ./models/vae_decoder.axmodel
time_input: ./models/time_input_txt2img.npy
save_dir: ./txt2img_output_axe.png
text encoder take 2891.1ms
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.3 972f38ca
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.3 972f38ca
load models take 26628.9ms
unet once take 437.5ms
unet once take 433.4ms
unet once take 433.6ms
unet once take 433.6ms
unet loop take 1741.2ms
vae inference take 914.8ms
save image take 210.5ms
(sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $
Output Image
run_txt2img_axe_infer.py
Input Prompt
Astronauts in a jungle, cold color palette, muted colors, detailed, 8k
Input Image
Output
(sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ python run_img2img_axe_infer.py
[INFO] Available providers: ['AXCLRTExecutionProvider']
prompt: Astronauts in a jungle, cold color palette, muted colors, detailed, 8k
text_tokenizer: ./models/tokenizer
text_encoder: ./models/text_encoder
unet_model: ./models/unet.axmodel
vae_encoder_model: ./models/vae_encoder.axmodel
vae_decoder_model: ./models/vae_decoder.axmodel
init image: ./models/img2img-init.png
time_input: ./models/time_input_img2img.npy
save_dir: ./img2img_output_axe.png
text encoder take 4494.8ms
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.3-dirty 2ecead35-dirty
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.3 972f38ca
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.3 972f38ca
load models take 27331.3ms
vae encoder inference take 460.4ms
unet once take 433.7ms
unet once take 433.5ms
unet loop take 871.7ms
vae decoder inference take 914.5ms
grid image saved in ./lcm_lora_sdv1-5_imgGrid_output.png
save image take 427.5ms
(sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $
Output Image
NPU 工具链 Pulsar2 在线文档
Github issues QQ 群: 139953715
Base model
Lykon/dreamshaper-7