Hello, I have a 3090 with 24g and this model won't load, can someone assist me?

#1
by Humeee33 - opened

I have not problems with models as large as 34B. Thanks again for any help you can provide. Also, I changed as many settings trying to get it to load but it never did. I let it default to Transformers first along with the other settings. I'm on oobabooga.

If I am correct, oobabooga uses .py (python) files... try LM Studios or Olama to run .gguf files and it will work blazing fast.

MrRobotoAI/Undi95-LewdStorytellerMix-8b-64k-Q4_K_M-GGUF

But if you would like send me a screen shot of you setting for the model... and I will look into it.

-LeRoy

First, thank you for responding to my request for help. I don't know anything about those other programs but will do some research. Here is the log with the errors. Thanks again for any help you can provide.

10:29:32-725782 INFO Loading "MrRobotoAI_Undi95-LewdStorytellerMix-8b-64k"
10:29:32-743803 INFO TRANSFORMERS_PARAMS=
{'low_cpu_mem_usage': True, 'torch_dtype': torch.float16}

C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\generation\configuration_utils.py:577: UserWarning: do_sample is set to False. However, min_p is set to 0.0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset min_p.
warnings.warn(
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:00<00:00, 5.72it/s]
10:29:52-191860 ERROR Failed to load the model.
Traceback (most recent call last):
File "C:\OggAugTwfour\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 101, in load_model
tokenizer = load_tokenizer(model_name, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 123, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 896, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2291, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2525, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_fast.py", line 115, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3

sounds like an error in the model.... let me look at the construction

Thanks for looking at this, I hear good things about this model, I look forward to trying it.

Try this I reworked the model:

MrRobotoAI/Undi95-LewdStorytellerMix-v2.2-8b-64k

Later today I will have out a: MrRobotoAI/Undi95-LewdStorytellerMix-v2.3-8b-64k version tell me which works better for you... both of these have large context widows of 1048k so that maybe taking up quite a bit of vram

Ok you have 3 models to test:

https://huggingface.co/MrRobotoAI/Undi95-LewdStorytellerMix-v2.1-8b-64k
https://huggingface.co/MrRobotoAI/Undi95-LewdStorytellerMix-v2.2-8b-64k
https://huggingface.co/MrRobotoAI/Undi95-LewdStorytellerMix-v2.3-8b-64k

What may ask do you wish to use these for... Dirty private chat, Uncensored stock, Uncensored writing or maybe erotic writing?

I downloaded the first one ...Mix-v2.1-8b-64k to test and it wouldn't load with the default settings. I even tried to change a few settings to see if it made any difference and it did not. Have you or anyone you work with been able to use these models in the oobabooga UI? Also, here's the error log again. I appreciate you trying to get this to work for me.

14:28:43-592366 INFO Loading "MrRobotoAI_Undi95-LewdStorytellerMix-v2.1-8b-64k"
14:28:43-597363 INFO TRANSFORMERS_PARAMS=
{'low_cpu_mem_usage': True, 'torch_dtype': torch.float16}

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:00<00:00, 6.31it/s]
14:29:00-765771 ERROR Failed to load the model.
Traceback (most recent call last):
File "C:\OggAugTwfour\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 101, in load_model
tokenizer = load_tokenizer(model_name, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 123, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 896, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2291, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2525, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_fast.py", line 115, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3

none of use use ooba

Did you try the other 2... all were made differently with different merge methods.... but all of them were made this time using " arcee-ai / mergekit " on hugging face spaces, which is the defacto standard

we use linux and another one uses ollama

Oh i did find out one of the people use q4 models on LM Studios under windows for testing

So none of the people you know of use https://github.com/oobabooga/text-generation-webui to run your models, right?

I appreciate your help on this, it might just be that your model won't work with oobabooga and that's a shame because a lot of people use oobabooga seemingly more as evidenced by the size of their reddit channel vs LM Studios.

well it is a little hard to figure out the problem without eviroment varibles and settings that I asked you for... I do see your outputs but they don't mean much other than "it won't load" nothing to do with "why it won't load". We are using mergekit made by arcee-ai, the same people that write all those papers on LLMs... so I don't think there is a problem there. I have personally created this merge in 3 seperate ways yet you say "MrRobotoAI/Undi95-LewdStorytellerMix-v2.1-8b-64k" doesn't work, have you tried the other two? I have asked what you are wanting the model to be used for, it might be that I can point you in a direction that you have not known about. To which I have no answer from you.

Your own output says "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\generation\configuration_utils.py:577: UserWarning: do_sample is set to False. However, min_p is set to 0.0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset min_p." Have you tried doing that?

Your own output says "File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\tokenization_utils_fast.py", line 115, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)" Have you looked at this line to see what it is asking for?

Exception: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3--- Are you using the tokenizer that I have or one supplied by ooba? meaning Did you just copy the safetensors from my HF folder?

TRANSFORMERS_PARAMS={'low_cpu_mem_usage': True, 'torch_dtype': torch.float16} --- I have said already that this model has a very large context window (e.g. consumes alot of vram) Have you tried a quat?

"So none of the people you know of use https://github.com/oobabooga/text-generation-webui to run your models, right? I appreciate your help on this, it might just be that your model won't work with oobabooga and that's a shame because a lot of people use oobabooga seemingly more as evidenced by the size of their reddit channel vs LM Studios." --- No, we don't. A lot more people use Ollama

Untitled.png

This machine is a dual e5-2697 with 512gb ram and a single rtx 3090

So, being that you are you first person to say something about this issue... IDK maybe the issue might be on your end... maybe you might have to trouble shoot some of the problems before I jump in and try gleem your env varibles through psychic prowess. I make, design, train and finetue models, I am not tech support for ooba.

So, if you would like me to help you...

  1. Are you using windows or linux, what version?
  2. What is your Ram and CPU?
  3. Are you using the full system (bare metal), docker, container, vm
  4. what is your Env varibles... python ver?
  5. using just the safetensors or full dir?
  6. what are your ooba settings?
  7. Have you tried setting "do_sample=True"?
  8. What does line 115 of tokenization_utils_fast.py command that it is failing on?

And yes, if we can rule out all of this, I might just install ooba and see if I can recreate the error.

-Leroy

Thanks again.

Are you using windows or linux, what version?: WIndow 11

What is your Ram and CPU? : 32G Ram, 12th Gen Intel i9-12900k

Are you using the full system (bare metal), docker, container, vm: Full system, not running apps in the background beyond the edge browser
what is your Env varibles... python ver?

Env:
C:\Users\HP2024>set
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\HP2024\AppData\Roaming
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
COMPUTERNAME=STABLE2023
ComSpec=C:\windows\system32\cmd.exe
DriverData=C:\Windows\System32\Drivers\DriverData
FPS_BROWSER_APP_PROFILE_STRING=Internet Explorer
FPS_BROWSER_USER_PROFILE_STRING=Default
HOMEDRIVE=C:
HOMEPATH=\Users\HP2024
LOCALAPPDATA=C:\Users\HP2024\AppData\Local
LOGONSERVER=\STABLE2023
NUMBER_OF_PROCESSORS=24
OneDrive=C:\Users\HP2024\OneDrive
OnlineServices=Online Services
OPENSSL_ia32cap=~0x20000000
OS=Windows_NT
Path=C:\Program Files (x86)\Razer Chroma SDK\bin;C:\Program Files\Razer Chroma SDK\bin;C:\Program Files (x86)\Razer\ChromaBroadcast\bin;C:\Program Files\Razer\ChromaBroadcast\bin;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0;C:\windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Git\cmd;C:\windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps;C:\Program Files\dotnet;C:\Users\HP2024\AppData\Local\Programs\Python\Python310\Scripts;C:\Users\HP2024\AppData\Local\Programs\Python\Python310;C:\Users\HP2024\AppData\Local\Microsoft\WindowsApps;C:\Users\HP2024.dotnet\tools
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
platformcode=M5
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
PROCESSOR_LEVEL=6
PROCESSOR_REVISION=9702
ProgramData=C:\ProgramData
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
PROMPT=$P$G
PSModulePath=C:\Program Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules
PUBLIC=C:\Users\Public
RegionCode=NA
SESSIONNAME=Console
SystemDrive=C:
SystemRoot=C:\windows
TEMP=C:\Users\HP2024\AppData\Local\Temp
TMP=C:\Users\HP2024\AppData\Local\Temp
USERDOMAIN=STABLE2023
USERDOMAIN_ROAMINGPROFILE=STABLE2023
USERNAME=HP2024
USERPROFILE=C:\Users\HP2024
windir=C:\windows

python:

print(os.environ)
environ({'ALLUSERSPROFILE': 'C:\ProgramData', 'APPDATA': 'C:\Users\HP2024\AppData\Roaming', 'COMMONPROGRAMFILES': 'C:\Program Files\Common Files', 'COMMONPROGRAMFILES(X86)': 'C:\Program Files (x86)\Common Files', 'COMMONPROGRAMW6432': 'C:\Program Files\Common Files', 'COMPUTERNAME': 'STABLE2023', 'COMSPEC': 'C:\windows\system32\cmd.exe', 'DRIVERDATA': 'C:\Windows\System32\Drivers\DriverData', 'FPS_BROWSER_APP_PROFILE_STRING': 'Internet Explorer', 'FPS_BROWSER_USER_PROFILE_STRING': 'Default', 'HOMEDRIVE': 'C:', 'HOMEPATH': '\Users\HP2024', 'LOCALAPPDATA': 'C:\Users\HP2024\AppData\Local', 'LOGONSERVER': '\\STABLE2023', 'NUMBER_OF_PROCESSORS': '24', 'ONEDRIVE': 'C:\Users\HP2024\OneDrive', 'ONLINESERVICES': 'Online Services', 'OPENSSL_IA32CAP': '~0x20000000', 'OS': 'Windows_NT', 'PATH': 'C:\Program Files (x86)\Razer Chroma SDK\bin;C:\Program Files\Razer Chroma SDK\bin;C:\Program Files (x86)\Razer\ChromaBroadcast\bin;C:\Program Files\Razer\ChromaBroadcast\bin;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Git\cmd;C:\windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps;C:\Program Files\dotnet\;C:\Users\HP2024\AppData\Local\Programs\Python\Python310\Scripts\;C:\Users\HP2024\AppData\Local\Programs\Python\Python310\;C:\Users\HP2024\AppData\Local\Microsoft\WindowsApps;C:\Users\HP2024\.dotnet\tools', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC', 'PLATFORMCODE': 'M5', 'PROCESSOR_ARCHITECTURE': 'AMD64', 'PROCESSOR_IDENTIFIER': 'Intel64 Family 6 Model 151 Stepping 2, GenuineIntel', 'PROCESSOR_LEVEL': '6', 'PROCESSOR_REVISION': '9702', 'PROGRAMDATA': 'C:\ProgramData', 'PROGRAMFILES': 'C:\Program Files', 'PROGRAMFILES(X86)': 'C:\Program Files (x86)', 'PROGRAMW6432': 'C:\Program Files', 'PROMPT': '$P$G', 'PSMODULEPATH': 'C:\Program Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules', 'PUBLIC': 'C:\Users\Public', 'REGIONCODE': 'NA', 'SESSIONNAME': 'Console', 'SYSTEMDRIVE': 'C:', 'SYSTEMROOT': 'C:\windows', 'TEMP': 'C:\Users\HP2024\AppData\Local\Temp', 'TMP': 'C:\Users\HP2024\AppData\Local\Temp', 'USERDOMAIN': 'STABLE2023', 'USERDOMAIN_ROAMINGPROFILE': 'STABLE2023', 'USERNAME': 'HP2024', 'USERPROFILE': 'C:\Users\HP2024', 'WINDIR': 'C:\windows'})

using just the safetensors or full dir?: Not entirely sure, but I suspect safetensors

what are your ooba settings?

MrRobotoAI_Undi95-LewdStorytellerMix-v2.1-8b-64k$:
loader: Transformers
cpu_memory: 0
auto_devices: false
disk: false
cpu: false
bf16: false
load_in_8bit: false
trust_remote_code: false
no_use_fast: false
use_flash_attention_2: false
use_eager_attention: false
load_in_4bit: false
compute_dtype: float16
quant_type: nf4
use_double_quant: false
disable_exllama: false
disable_exllamav2: false
compress_pos_emb: 1
alpha_value: 1
gpu_memory_0: 0

Have you tried setting "do_sample=True"?: This is how I'm set up do_sample is checked

What does line 115 of tokenization_utils_fast.py command that it is failing on?

    if tokenizer_object is not None:

line below is 115

        fast_tokenizer = copy.deepcopy(tokenizer_object)

#line above is 115
elif fast_tokenizer_file is not None and not from_slow:
# We have a serialization from tokenizers which let us directly build the backend
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
elif slow_tokenizer is not None:
# We need to convert a slow tokenizer to build the backend
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
elif gguf_file is not None:
# We need to convert a slow tokenizer to build the backend
gguf_param = load_gguf_checkpoint(kwargs.get("vocab_file"))
architecture = gguf_param["config"]["model_type"]
tokenizer_dict = gguf_param["tokenizer"]
fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)

And yes, if we can rule out all of this, I might just install ooba and see if I can recreate the error.

when you installed "Run the script that matches your OS: start_linux.sh, start_windows.bat, start_macos.sh, or start_wsl.bat."

did you use the "start_windows.bat" or the "start_wsl.bat"? Both or windows but wsl runs linux inside of windows... "Files\dotnet;C:\Users\HP2024\AppData\Local\Programs\Python\Python310" Python 3.10 in your programs folder makes me think that you probably used "start_windows.bat" --- This is not an issue just reference for me

#line above is 115
elif fast_tokenizer_file is not None and not from_slow:

We have a serialization from tokenizers which let us directly build the backend

fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
elif slow_tokenizer is not None:

We need to convert a slow tokenizer to build the backend

fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
elif gguf_file is not None:

We need to convert a slow tokenizer to build the backend

gguf_param = load_gguf_checkpoint(kwargs.get("vocab_file"))
architecture = gguf_param["config"]["model_type"]
tokenizer_dict = gguf_param["tokenizer"]
fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)

There is an issue with the tokenizer that it cannot build a gguf file to use

mradermacher has a complete set of GGUFs built for my model from a q2 4.5GB (which is horrible never use it) to a f16 16Gb (which is a full model just packed up with the tokenizer and dict)
mradermacher/Undi95-LewdStorytellerMix-v2.1-8b-64k-GGUF https://huggingface.co/mradermacher/Undi95-LewdStorytellerMix-v2.1-8b-64k-i1-GGUF

image.png

GGUF IQ4_XS 4.6
GGUF Q4_K_S 4.8 fast, recommended <------------
GGUF Q4_K_M 5.0 fast, recommended <------------Either of these should load with no problem
GGUF Q5_K_S 5.7
GGUF Q5_K_M 5.8
GGUF Q6_K 6.7 very good quality
GGUF Q8_0 8.6 fast, best quality
GGUF f16 16.2 16 bpw, overkill

---Ahhh, you only tried one model because ooba uses gguf file and that is the only one that i created a gguf file for....
https://huggingface.co/spaces/ggml-org/gguf-my-repo --Lets you create your own ggufs of any model in your HF by-the-way

sorry, reading the documentation right now... while I dig more, can you try one of mradermacher's q4 files

Once again, thank you for helping me with my load issue.

I tried the bigger one the webpage recommended first (Q8_0) because I missed you pointing out which one I SHOULD use (sorry) but when that failed to load, I went with the one you recommened (Q4_K_M) which also failed, see below

Traceback (most recent call last):

File "C:\OggAugTwfour\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper

shared.model, shared.tokenizer = load_model(selected_model, loader)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 93, in load_model

output = load_func_maploader

     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\OggAugTwfour\text-generation-webui-main\modules\models.py", line 274, in llamacpp_loader

model, tokenizer = LlamaCppModel.from_pretrained(model_file)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\OggAugTwfour\text-generation-webui-main\modules\llamacpp_model.py", line 85, in from_pretrained

Undi95-LewdStorytellerMix-v2.1-8b-64k.Q4_K_M$:
loader: llama.cpp
cpu: false
cache_8bit: false
cache_4bit: false
threads: 0
threads_batch: 0
n_batch: 512
no_mmap: false
mlock: false
no_mul_mat_q: false
n_gpu_layers: 33
tensor_split: ''
n_ctx: 1048576
compress_pos_emb: 1
rope_freq_base: 2804339712
numa: false
no_offload_kqv: false
row_split: false
tensorcores: false
flash_attn: false
streaming_llm: false
attention_sink_size: 5

result.model = Llama(**params)

           ^^^^^^^^^^^^^^^

File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 392, in init

_LlamaContext(
File "C:\OggAugTwfour\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_internals.py", line 298, in init

raise ValueError("Failed to create llama_context")
ValueError: Failed to create llama_context

Ohhh, I didn't mean that you should "try." so we could see if the reduction in size.... that would rule out vram usage === checked, ruled out
I have installed OOBA but I have not had time to thoroughly troubleshoot.
It does seem like a tokenizer issue so I will try using a standard tokenizer from L3.1-meta... before creating a gguf
have you tried the model quants from mradermacher/Undi95-LewdStorytellerMix-v2.1-8b-64k-GGUF https://huggingface.co/mradermacher/Undi95-LewdStorytellerMix-v2.1-8b-64k-i1-GGUF ?

First, I missed the question about what I start it with too. I use: start_windows.bat BUT WAIT THERE'S MORE

It turns out that the first models I tried just didn't work no matter what I changed, but then you suggested some new ones and those didn't work but there was ONE setting I didn't understand (I feel so stupid) that you EVEN MENTIONED but again, I didn't understand, that being the QUANTITY OF CONTEXT ! I've never had a model ask me that before so I didn't even think to change it but then I asked grok what 1048576 MiB meant and it say 112Gib ! I only have 24.

So, I changed the earlier ones you meantioned after the initial ones didn't load to around 16K and it loaded. Then I went back to the bigger one you meantioned Undi95-LewdStorytellerMix-v2.1-8b-64k.Q8_0.gguf and tried that it loaded. Not only did it load but it was UNBELIEVABLY cooperative. I was stunned.

So, again, thank you for moving me to other choices for your model, the ones where I could change the context amount allowed me to load them. I was not able to change the context (because there was no option to do so for those models e.g. MrRobotoAI_Undi95-LewdStorytellerMix-v2.1-8b-64k has no context box when I pick it).

I am at last able to load a version of your amazing model and I am very happy. I greatly appreciate your help as I'm somewhat technical, but only if the UI does all the thinking for me. Next chance I get I will sing the praises of this model if anyone asks for a "good" model to try.

Sign up or log in to comment