ThiloteE commited on
Commit
8099ce6
·
verified ·
1 Parent(s): 80b4f97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +262 -3
README.md CHANGED
@@ -1,3 +1,262 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2-7B
3
+ pipeline_tag: text-generation
4
+ inference: false
5
+ model_creator: Google
6
+ model_name: Replete-LLM-Qwen2-7b
7
+ model_type: qwen2
8
+ language:
9
+ - en
10
+ datasets:
11
+ - Replete-AI/Everything_Instruct_8k_context_filtered
12
+ library_name: transformers
13
+ license: apache-2.0
14
+ quantized_by: ThiloteE
15
+ tags:
16
+ - text-generation-inference
17
+ - transformers
18
+ - unsloth
19
+ - GGUF
20
+ - GPT4All-community
21
+ - GPT4All
22
+ - conversational
23
+ - coding
24
+
25
+
26
+
27
+ ---
28
+ # About
29
+
30
+ <!-- ### quantize_version: 3 -->
31
+ <!-- ### output_tensor_quantised: 1 -->
32
+ <!-- ### convert_type: hf -->
33
+ <!-- ### vocab_type: -->
34
+ <!-- ### tags: -->
35
+
36
+ - Static quants of https://huggingface.co/Replete-AI/Replete-LLM-Qwen2-7b at commit [e356943](https://huggingface.co/Replete-AI/Replete-LLM-Qwen2-7b/commit/e3569433b23fde853683ad61f342d2c1bd01d60a)
37
+ - Quantized by [ThiloteE](https://huggingface.co/ThiloteE) with llama.cpp commit [e09a800](https://github.com/ggerganov/llama.cpp/commit/e09a800f9a9b19c73aa78e03b4c4be8ed988f3e6)
38
+
39
+ # Notes
40
+
41
+ These quants were created with a customized configuration that have been proven to not cause visible end of string (eos) tokens during inference with [GPT4All](https://www.nomic.ai/gpt4all).
42
+ The config.json, generation_config.json and tokenizer_config.json differ from the original configuration as can be found in the original model's repository at the time of creation of these quants.
43
+
44
+ # Prompt Template (for GPT4All)
45
+
46
+ Example System Prompt:
47
+ ```
48
+ <|im_start|>system
49
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
50
+ ```
51
+
52
+ Chat Template:
53
+ ```
54
+ <|im_start|>user
55
+ %1<|im_end|>
56
+ <|im_start|>assistant
57
+ %2<|im_end|>
58
+ ```
59
+
60
+ # Context Length
61
+
62
+ `32768`
63
+
64
+ Use a lower value during inference, if you do not have enough RAM or VRAM.
65
+
66
+ # Provided Quants
67
+
68
+
69
+ | Link | Type | Size/GB | Notes |
70
+ |:-----|:-----|--------:|:------|
71
+ | [GGUF](https://huggingface.co/GPT4All-Community/Replete-LLM-Qwen2-7b-GGUF/resolve/main/Replete-LLM-Qwen2-7b-Q4_0.gguf) | Q4_0 | 5.44 | fast, recommended |
72
+
73
+
74
+
75
+
76
+ # About GGUF
77
+
78
+ If you are unsure how to use GGUF files, refer to one of [TheBloke's
79
+ READMEs](https://huggingface.co/TheBloke/DiscoLM_German_7b_v1-GGUF) for
80
+ more details, including on how to concatenate multi-part files.
81
+
82
+ Here is a handy graph by ikawrakow comparing some quant types (lower is better):
83
+
84
+ ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
85
+
86
+ And here are Artefact2's thoughts on the matter:
87
+ https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
88
+
89
+ # Thanks
90
+
91
+ I thank Mradermacher and TheBloke for Inspiration to this model card and their contributions to open source. Also 3Simplex for lots of help along the way.
92
+ Shoutout to the GPT4All and llama.cpp communities :-)
93
+
94
+
95
+ ------
96
+
97
+ <!-- footer end -->
98
+ <!-- original-model-card start -->
99
+
100
+ ------
101
+ ------
102
+
103
+ ---
104
+ license: apache-2.0
105
+ base_model:
106
+ - Qwen/Qwen2-7B
107
+ datasets:
108
+ - Replete-AI/Everything_Instruct_8k_context_filtered
109
+ tags:
110
+ - unsloth
111
+ language:
112
+ - en
113
+ ---
114
+ Replete-LLM-Qwen2-7b
115
+
116
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/q9gC-_O4huL2pK4nY-Y2x.png)
117
+
118
+ Thank you to TensorDock for sponsoring **Replete-LLM**
119
+ you can check out their website for cloud compute rental below.
120
+ - https://tensordock.com
121
+ _____________________________________________________________
122
+ **Replete-LLM** is **Replete-AI**'s flagship model. We take pride in releasing a fully open-source, low parameter, and competitive AI model that not only surpasses its predecessor **Qwen2-7B-Instruct** in performance, but also competes with (if not surpasses) other flagship models from closed source like **gpt-3.5-turbo**, but also open source models such as **gemma-2-9b-it**
123
+ and **Meta-Llama-3.1-8B-Instruct** in terms of overall performance across all fields and categories. You can find the dataset that this model was trained on linked bellow:
124
+
125
+ - https://huggingface.co/datasets/Replete-AI/Everything_Instruct_8k_context_filtered
126
+
127
+ Try bartowski's quantizations:
128
+
129
+ - https://huggingface.co/bartowski/Replete-LLM-Qwen2-7b-exl2
130
+
131
+ - https://huggingface.co/bartowski/Replete-LLM-Qwen2-7b-GGUF
132
+
133
+ Cant run the model locally? Well then use the huggingface space instead:
134
+
135
+ - https://huggingface.co/spaces/rombodawg/Replete-LLM-Qwen2-7b
136
+
137
+ Some statistics about the data the model was trained on can be found in the image and details bellow, while a more comprehensive look can be found in the model card for the dataset. (linked above):
138
+
139
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/75SR21J3-zbTGKYbeoBzX.png)
140
+
141
+ **Replete-LLM-Qwen2-7b** is a versatile model fine-tuned to excel on any imaginable task. The following types of generations were included in the fine-tuning process:
142
+
143
+ - **Science**: (General, Physical Reasoning)
144
+ - **Social Media**: (Reddit, Twitter)
145
+ - **General Knowledge**: (Character-Codex), (Famous Quotes), (Steam Video Games), (How-To? Explanations)
146
+ - **Cooking**: (Cooking Preferences, Recipes)
147
+ - **Writing**: (Poetry, Essays, General Writing)
148
+ - **Medicine**: (General Medical Data)
149
+ - **History**: (General Historical Data)
150
+ - **Law**: (Legal Q&A)
151
+ - **Role-Play**: (Couple-RP, Roleplay Conversations)
152
+ - **News**: (News Generation)
153
+ - **Coding**: (3 million rows of coding data in over 100 coding languages)
154
+ - **Math**: (Math data from TIGER-Lab/MathInstruct)
155
+ - **Function Calling**: (Function calling data from "glaiveai/glaive-function-calling-v2")
156
+ - **General Instruction**: (All of teknium/OpenHermes-2.5 fully filtered and uncensored)
157
+ ______________________________________________________________________________________________
158
+ ## Prompt Template: ChatML
159
+ ```
160
+ <|im_start|>system
161
+ {}<|im_end|>
162
+ <|im_start|>user
163
+ {}<|im_end|>
164
+ <|im_start|>assistant
165
+ {}
166
+ ```
167
+
168
+ ## End token (eot_token)
169
+ ```
170
+ <|endoftext|>
171
+ ```
172
+ ______________________________________________________________________________________________
173
+ Want to know the secret sause of how this model was made? Find the write up bellow
174
+
175
+ **Continuous Fine-tuning Without Loss Using Lora and Mergekit**
176
+
177
+ https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit?usp=sharing
178
+ ______________________________________________________________________________________________
179
+
180
+ The code to finetune this AI model can be found bellow
181
+
182
+ - https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing
183
+
184
+ - Note this model in particular was finetuned using an h100 using Tensordock.com using the Pytorch OS. In order to use Unsloth code with TensorDock you need to run the following code (Bellow) to reinstall drivers on TensorDock before unsloth works. After running the code bellow, your Virtual Machine will reset, and you will have to SSH back into it. And then you can run the normal unsloth code in order.
185
+
186
+ ```python
187
+ # Check Current Size
188
+ !df -h /dev/shm
189
+
190
+ # Increase Size Temporarily
191
+ !sudo mount -o remount,size=16G /dev/shm
192
+
193
+ # Increase Size Permanently
194
+ !echo "tmpfs /dev/shm tmpfs defaults,size=16G 0 0" | sudo tee -a /etc/fstab
195
+
196
+ # Remount /dev/shm
197
+ !sudo mount -o remount /dev/shm
198
+
199
+
200
+ # Verify the Changes
201
+ !df -h /dev/shm
202
+
203
+ !nvcc --version
204
+
205
+ !export TORCH_DISTRIBUTED_DEBUG=DETAIL
206
+ !export NCCL_DEBUG=INFO
207
+ !python -c "import torch; print(torch.version.cuda)"
208
+ !export PATH=/usr/local/cuda/bin:$PATH
209
+ !export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
210
+ !export NCCL_P2P_LEVEL=NVL
211
+ !export NCCL_DEBUG=INFO
212
+ !export NCCL_DEBUG_SUBSYS=ALL
213
+ !export TORCH_DISTRIBUTED_DEBUG=INFO
214
+ !export TORCHELASTIC_ERROR_FILE=/PATH/TO/torcherror.log
215
+ !sudo apt-get remove --purge -y '^nvidia-.*'
216
+ !sudo apt-get remove --purge -y '^cuda-.*'
217
+ !sudo apt-get autoremove -y
218
+ !sudo apt-get autoclean -y
219
+ !sudo apt-get update -y
220
+ !sudo apt-get install -y nvidia-driver-535 cuda-12-1
221
+ !sudo add-apt-repository ppa:graphics-drivers/ppa -y
222
+ !sudo apt-get update -y
223
+ !sudo apt-get update -y
224
+ !sudo apt-get install -y software-properties-common
225
+ !sudo add-apt-repository ppa:graphics-drivers/ppa -y
226
+ !sudo apt-get update -y
227
+ !latest_driver=$(apt-cache search '^nvidia-driver-[0-9]' | grep -oP 'nvidia-driver-\K[0-9]+' | sort -n | tail -1) && sudo apt-get install -y nvidia-driver-$latest_driver
228
+ !sudo reboot
229
+ ```
230
+ _______________________________________________________________________________
231
+
232
+ ## Join the Replete-Ai discord! We are a great and Loving community!
233
+
234
+ - https://discord.gg/ZZbnsmVnjD
235
+
236
+
237
+
238
+
239
+
240
+
241
+
242
+
243
+
244
+
245
+
246
+
247
+
248
+
249
+
250
+
251
+
252
+
253
+
254
+
255
+ # Original Model card:
256
+
257
+
258
+
259
+
260
+
261
+ <!-- original-model-card end -->
262
+ <!-- end -->