Mat1 and mat2 can not be multiplied?

#9
by Serget2 - opened

Running this in comfy, updated everything and restarted, but now I get this error on the standard workflow (the anime bunny girl)

Error occurred when executing SamplerCustomAdvanced:

mat1 and mat2 shapes cannot be multiplied (1x1280 and 768x3072)

File "D:\Automatic1.1.1.1\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Automatic1.1.1.1\ComfyUI_windows_portable\ComfyUI\execution.p

Any ideas? The Sampler custom advanced is outlined purple changing models or sizes does not change the error

I just loaded the schnell model workflow and it works, then reloaded the bunnygirl model and it works too, strange, seems like updating and restarting wasn't enough, but it's fixed now

Serget2 changed discussion status to closed

what are the image sizes this model can generate?

For me the fix was this : The DualClipLoader somehow switched its type to sdxl. When switched back to the type "flux" the workflow did its slooow thing .

I get this error ("RuntimeError: mat1 and mat2 shapes cannot be multiplied...") attempting to use any of the 14 different flavors of Flux models I've found, with or without me adding VAE's (usually "ae.safetensors", "clip_l.safetensors", and "t5xxl_fp16.safetensors" (or the fp8 equivalent) in that order) to them as some seem to require this but others do not. Don't have any trouble with most of the non-Flux models I've downloaded at all - virtually all of them work fine. I've tried a variety of options within the parameters in the Stable Diffusion/WebUIForge interface but have never gotten any of them to work at all. I'm running on an i7-5820k (6 cores/12 threads) with 64GB of main memory and a RX-6750XT (12GB) GPU. I'm a newbee at this for sure, but don't know how to diagnose further or what to try next. I've tried to find as many threads discussing these errors and possible solutions, but so far nothing has provided a working solution at all. Any thoughts, suggestions, or advice would be greatly appreciated! Thanks in advance!

Here's the console log:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-572-g1fae20d9
Commit hash: 1fae20d94f3b612f89ddc05e7df3bf758d6bd6bb
B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\extensions-builtin\forge_legacy_preprocessors\install.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\extensions-builtin\sd_forge_controlnet\install.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
Launching Web UI with arguments: --directml --skip-torch-cuda-test --gradio-allowed-path 'B:\Flux.1\Data\Images'
Using directml with device:
Total VRAM 1024 MB, total RAM 65436 MB
pytorch version: 2.4.1+cpu
Set vram state to: NORMAL_VRAM
Device: privateuseone
VAE dtype preferences: [torch.float32] -> torch.float32
CUDA Using Stream: False
Using sub quadratic optimization for cross attention
Using split attention for VAE
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
ControlNet preprocessor location: B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\models\ControlNetPreprocessor
Loading additional modules ... done.
2024-10-18 23:36:20,376 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\models\Stable-diffusion\dreamshaper_8.safetensors', 'hash': '9d40847d'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Startup time: 63.3s (prepare environment: 2.9s, import torch: 37.4s, initialize shared: 0.6s, other imports: 1.4s, list SD models: 0.4s, load scripts: 5.0s, initialize google blockly: 7.9s, create ui: 4.5s, gradio launch: 3.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 0.00% GPU memory (0.00 MB) to load weights, and use 100.00% GPU memory (1024.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\models\Stable-diffusion\flux1-schnell-bnb-nf4.safetensors', 'hash': '7d3d1873'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Loading Model: {'checkpoint_info': {'filename': 'B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\models\Stable-diffusion\flux1-schnell-bnb-nf4.safetensors', 'hash': '7d3d1873'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for privateuseone:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 2336, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.float32}
Calculating sha256 for B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\models\Stable-diffusion\flux1-schnell-bnb-nf4.safetensors: c922e7f77fadbf97839409def23d68c9532a7d67666b9c06bfb85ac95966688f
Model loaded in 244.4s (unload existing model: 0.3s, forge model load: 244.1s).
All loaded to GPU.
Moving model(s) has taken 0.01 seconds
Distilled CFG Scale will be ignored for Schnell
Distilled CFG Scale will be ignored for Schnell
[Unload] Trying to free 34772.45 MB for privateuseone:0 with 0 models keep loaded ... Current free memory is 1024.00 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 1024.00 MB, Model Require: 22686.50 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: -22686.50 MB, CPU Swap Loaded (blocked method): 22686.42 MB, GPU Loaded: 0.07 MB
Moving model(s) has taken 0.05 seconds
0%| | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules_forge\main_thread.py", line 30, in work
self.result = self.func(*self.args, **self.kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\txt2img.py", line 124, in txt2img_function
processed = processing.process_images(p)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\processing.py", line 835, in process_images
res = process_images_inner(p)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\processing.py", line 983, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\processing.py", line 1372, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\sd_samplers_kdiffusion.py", line 238, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\sd_samplers_common.py", line 272, in launch_sampling
return func()
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\sd_samplers_kdiffusion.py", line 238, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\k_diffusion\sampling.py", line 146, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in call_impl
return forward_call(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\modules\sd_samplers_cfg_denoiser.py", line 199, in forward
denoised, cond_pred, uncond_pred = sampling_function(self, denoiser_params=denoiser_params, cond_scale=cond_scale, cond_composition=cond_composition)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\sampling\sampling_function.py", line 362, in sampling_function
denoised, cond_pred, uncond_pred = sampling_function_inner(model, x, timestep, uncond, cond, cond_scale, model_options, seed, return_full=True)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\sampling\sampling_function.py", line 303, in sampling_function_inner
cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond
, x, timestep, model_options)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\sampling\sampling_function.py", line 273, in calc_cond_uncond_batch
mat1 and mat2 shapes cannot be multiplied (6400x64 and 1x98304)
output = model.apply_model(input_x, timestep
, **c).chunk(batch_chunks)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\modules\k_model.py", line 45, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\nn\flux.py", line 418, in forward
out = self.inner_forward(img, img_ids, context, txt_ids, timestep, y, guidance)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\nn\flux.py", line 375, in inner_forward
img = self.img_in(img)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "B:\Flux.1\Data\Packages\stable-diffusion-webui-forge\backend\operations.py", line 147, in forward
return torch.nn.functional.linear(x, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6400x64 and 1x98304)

Any update on this? I get the same error.

重新跑下不带控制网的flux后就奇迹般的好了

For me the fix was this : The DualClipLoader somehow switched its type to sdxl. When switched back to the type "flux" the workflow did its slooow thing .

fixed silly mistake

Sign up or log in to comment