Suparious commited on
Commit
aa0b8a6
·
verified ·
1 Parent(s): 67b57fc

Add processing notice

Browse files
Files changed (1) hide show
  1. README.md +2 -59
README.md CHANGED
@@ -1,70 +1,13 @@
1
  ---
2
- base_model: rombodawg/Llama-3-8B-Base-Coder-v3.5-10k
3
  inference: false
4
- library_name: transformers
5
- pipeline_tag: text-generation
6
- quantized_by: Suparious
7
- tags:
8
- - 4-bit
9
- - AWQ
10
- - text-generation
11
- - autotrain_compatible
12
- - endpoints_compatible
13
  ---
14
  # rombodawg/Llama-3-8B-Base-Coder-v3.5-10k AWQ
15
 
 
 
16
  - Model creator: [rombodawg](https://huggingface.co/rombodawg)
17
  - Original model: [Llama-3-8B-Base-Coder-v3.5-10k](https://huggingface.co/rombodawg/Llama-3-8B-Base-Coder-v3.5-10k)
18
 
19
-
20
-
21
- ## How to use
22
-
23
- ### Install the necessary packages
24
-
25
- ```bash
26
- pip install --upgrade autoawq autoawq-kernels
27
- ```
28
-
29
- ### Example Python code
30
-
31
- ```python
32
- from awq import AutoAWQForCausalLM
33
- from transformers import AutoTokenizer, TextStreamer
34
-
35
- model_path = "solidrust/Llama-3-8B-Base-Coder-v3.5-10k-AWQ"
36
- system_message = "You are Llama-3-8B-Base-Coder-v3.5-10k, incarnated as a powerful AI. You were created by rombodawg."
37
-
38
- # Load model
39
- model = AutoAWQForCausalLM.from_quantized(model_path,
40
- fuse_layers=True)
41
- tokenizer = AutoTokenizer.from_pretrained(model_path,
42
- trust_remote_code=True)
43
- streamer = TextStreamer(tokenizer,
44
- skip_prompt=True,
45
- skip_special_tokens=True)
46
-
47
- # Convert prompt to tokens
48
- prompt_template = """\
49
- <|im_start|>system
50
- {system_message}<|im_end|>
51
- <|im_start|>user
52
- {prompt}<|im_end|>
53
- <|im_start|>assistant"""
54
-
55
- prompt = "You're standing on the surface of the Earth. "\
56
- "You walk one mile south, one mile west and one mile north. "\
57
- "You end up exactly where you started. Where are you?"
58
-
59
- tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
60
- return_tensors='pt').input_ids.cuda()
61
-
62
- # Generate output
63
- generation_output = model.generate(tokens,
64
- streamer=streamer,
65
- max_new_tokens=512)
66
- ```
67
-
68
  ### About AWQ
69
 
70
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 
1
  ---
 
2
  inference: false
 
 
 
 
 
 
 
 
 
3
  ---
4
  # rombodawg/Llama-3-8B-Base-Coder-v3.5-10k AWQ
5
 
6
+ ** PROCESSING .... ETA 30mins **
7
+
8
  - Model creator: [rombodawg](https://huggingface.co/rombodawg)
9
  - Original model: [Llama-3-8B-Base-Coder-v3.5-10k](https://huggingface.co/rombodawg/Llama-3-8B-Base-Coder-v3.5-10k)
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ### About AWQ
12
 
13
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.