dfurman commited on
Commit
e83e0b7
·
verified ·
1 Parent(s): 3f34493

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -1
README.md CHANGED
@@ -18,6 +18,8 @@ tags:
18
 
19
  ## Introduction
20
 
 
 
21
  Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model.
22
 
23
  Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
@@ -29,4 +31,85 @@ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2
29
 
30
  ## This finetune
31
 
32
- Qwen2-72B-Orpo-v0.1 is a QLoRA finetune of `Qwen/Qwen2-72B-Instruct` on 1.5k rows of `mlabonne/orpo-dpo-mix-40k`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Introduction
20
 
21
+ From Qwen2:
22
+
23
  Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model.
24
 
25
  Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
 
31
 
32
  ## This finetune
33
 
34
+ Qwen2-72B-Orpo-v0.1 is a QLoRA finetune of `Qwen/Qwen2-72B-Instruct` on 1.5k rows of `mlabonne/orpo-dpo-mix-40k`.
35
+
36
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/CdV47RW1zjr7qvD073NkZ.png)
37
+
38
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/PB-25NSKcbFMZuZ3vYptR.png)
39
+
40
+ You can find the experiment on W&B at [this address](https://wandb.ai/dryanfurman/huggingface/runs/fw7mtub1?nw=nwuserdryanfurman).
41
+
42
+
43
+ ## 💻 Usage
44
+
45
+ <details>
46
+
47
+ <summary>Setup</summary>
48
+
49
+ ```python
50
+ !pip install -qU transformers accelerate bitsandbytes
51
+ !huggingface-cli download dfurman/Qwen2-72B-Orpo-v0.1
52
+ ```
53
+
54
+ ```python
55
+ from transformers import AutoTokenizer, BitsAndBytesConfig
56
+ import transformers
57
+ import torch
58
+
59
+
60
+ if torch.cuda.get_device_capability()[0] >= 8:
61
+ !pip install -qqq flash-attn
62
+ attn_implementation = "flash_attention_2"
63
+ torch_dtype = torch.bfloat16
64
+ else:
65
+ attn_implementation = "eager"
66
+ torch_dtype = torch.float16
67
+
68
+ # quantize if necessary
69
+ # bnb_config = BitsAndBytesConfig(
70
+ # load_in_4bit=True,
71
+ # bnb_4bit_quant_type="nf4",
72
+ # bnb_4bit_compute_dtype=torch_dtype,
73
+ # bnb_4bit_use_double_quant=True,
74
+ # )
75
+
76
+ model = "dfurman/Qwen2-72B-Orpo-v0.1"
77
+
78
+ tokenizer = AutoTokenizer.from_pretrained(model)
79
+ pipeline = transformers.pipeline(
80
+ "text-generation",
81
+ model=model,
82
+ model_kwargs={
83
+ "torch_dtype": torch_dtype,
84
+ # "quantization_config": bnb_config,
85
+ "device_map": "auto",
86
+ "attn_implementation": attn_implementation,
87
+ }
88
+ )
89
+ ```
90
+
91
+ </details>
92
+
93
+ ### Run
94
+
95
+ ```python
96
+ messages = [
97
+ {"role": "system", "content": "You are a helpful assistant."},
98
+ {"role": "user", "content": "Tell me a recipe for a spicy margarita."},
99
+ ]
100
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
101
+ print("***Prompt:\n", prompt)
102
+
103
+ outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
104
+ print("***Generation:\n", outputs[0]["generated_text"][len(prompt):])
105
+ ```
106
+
107
+ <details>
108
+
109
+ <summary>Output</summary>
110
+
111
+
112
+
113
+ </details>
114
+
115
+