Update README.md
Browse files
README.md
CHANGED
@@ -296,7 +296,27 @@ And thank you again to a16z for their generous grant.
|
|
296 |
<!-- original-model-card start -->
|
297 |
# Original model card: Ross Ascends's Mistral 7B Dolphin2.1 Lima0.5
|
298 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
299 |
exllama v2 4bpw quant: https://huggingface.co/RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2
|
300 |
-
|
301 |
|
302 |
<!-- original-model-card end -->
|
|
|
296 |
<!-- original-model-card start -->
|
297 |
# Original model card: Ross Ascends's Mistral 7B Dolphin2.1 Lima0.5
|
298 |
|
299 |
+
ehartford's merge of Mistral 7B 0.1 with his Dolphin 2.1 dataset
|
300 |
+
|
301 |
+
https://huggingface.co/ehartford/dolphin-2.1-mistral-7b
|
302 |
+
|
303 |
+
and
|
304 |
+
|
305 |
+
LIMA RP dataset applied as a lora at 0.5 weight
|
306 |
+
|
307 |
+
https://huggingface.co/lemonilia/limarp-llama2-v2/
|
308 |
+
|
309 |
+
Purpose of the model is to be RP-focused, smart, fast, and lightweight for users with low VRAM.
|
310 |
+
|
311 |
+
I've already built the exl2 4bpw quant (linked below), and it will run 8k ctx at around 6GB VRAM and respond to a full context at roughly 30tps (tested on my 3060) if exl2_hf loader is used with FA2 enabled.
|
312 |
+
|
313 |
+
Model has been tested by several users on the SillyTavern discord server, and run on Horde for a full day - with good results.
|
314 |
+
|
315 |
+
https://huggingface.co/RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2
|
316 |
+
|
317 |
+
Mistral or ChatML context presets both possible.
|
318 |
+
|
319 |
exllama v2 4bpw quant: https://huggingface.co/RossAscends/Mistral7B_Dolphin2.1_LIMARP0.5_4bpw_exl2
|
320 |
+
|
321 |
|
322 |
<!-- original-model-card end -->
|