dfurman
/

Qwen2-72B-Orpo-v0.1

@@ -18,6 +18,8 @@ tags:
 ## Introduction
 Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model.
 Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
@@ -29,4 +31,85 @@ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2
 ## This finetune
-Qwen2-72B-Orpo-v0.1 is a QLoRA finetune of `Qwen/Qwen2-72B-Instruct` on 1.5k rows of `mlabonne/orpo-dpo-mix-40k`.

 ## Introduction
+From Qwen2:
 Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model.
 Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
 ## This finetune
+Qwen2-72B-Orpo-v0.1 is a QLoRA finetune of `Qwen/Qwen2-72B-Instruct` on 1.5k rows of `mlabonne/orpo-dpo-mix-40k`.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/CdV47RW1zjr7qvD073NkZ.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/PB-25NSKcbFMZuZ3vYptR.png)
+You can find the experiment on W&B at [this address](https://wandb.ai/dryanfurman/huggingface/runs/fw7mtub1?nw=nwuserdryanfurman).
+## 💻 Usage
+<details>
+<summary>Setup</summary>
+```python
+!pip install -qU transformers accelerate bitsandbytes
+!huggingface-cli download dfurman/Qwen2-72B-Orpo-v0.1
+```
+```python
+from transformers import AutoTokenizer, BitsAndBytesConfig
+import transformers
+import torch
+if torch.cuda.get_device_capability()[0] >= 8:
+    !pip install -qqq flash-attn
+    attn_implementation = "flash_attention_2"
+    torch_dtype = torch.bfloat16
+else:
+    attn_implementation = "eager"
+    torch_dtype = torch.float16
+# quantize if necessary
+# bnb_config = BitsAndBytesConfig(
+#    load_in_4bit=True,
+#    bnb_4bit_quant_type="nf4",
+#    bnb_4bit_compute_dtype=torch_dtype,
+#    bnb_4bit_use_double_quant=True,
+# )
+model = "dfurman/Qwen2-72B-Orpo-v0.1"
+tokenizer = AutoTokenizer.from_pretrained(model)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    model_kwargs={
+        "torch_dtype": torch_dtype,
+        # "quantization_config": bnb_config,
+        "device_map": "auto",
+        "attn_implementation": attn_implementation,
+    }
+)
+```
+</details>
+### Run
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Tell me a recipe for a spicy margarita."},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+print("***Prompt:\n", prompt)
+outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+print("***Generation:\n", outputs[0]["generated_text"][len(prompt):])
+```
+<details>
+<summary>Output</summary>
+</details>