dfurman
/

Mistral-7B-Instruct-v0.1

@@ -23,7 +23,7 @@ The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a pretrained generati
 ## Model Details
-This model was built via parameter-efficient finetuning of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab.
 - **Developed by:** Daniel Furman
 - **Model type:** Decoder-only
@@ -49,8 +49,6 @@ We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/Eleu
 ## Basic Usage
-*Note*: Use the code below to get started with the sft models herein, as ran on 1x A100.
 ```python
 !pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece
@@ -64,38 +62,49 @@ from transformers import (
 ```
 ```python
-peft_model_id = "dfurman/Mistral-7B-Instruct-DRF-v0.1"
 config = PeftConfig.from_pretrained(peft_model_id)
 model = AutoModelForCausalLM.from_pretrained(
     config.base_model_name_or_path,
-    torch_dtype=torch.bfloat16,
     device_map="auto",
     trust_remote_code=True,
 )
-tokenizer = AutoTokenizer.from_pretrained(
-    config.base_model_name_or_path,
-    use_fast=True,
-    trust_remote_code=True,
 )
-model = PeftModel.from_pretrained(model, peft_model_id)
-format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"
 ```
 ```python
-query = "Write a short email inviting my friends to a dinner party on Friday. Respond succinctly."
-prompt = format_template.format(query=query)
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
 with torch.autocast("cuda", dtype=torch.bfloat16):
     output = model.generate(
         input_ids=input_ids,
-        max_new_tokens=512,
         do_sample=True,
-        temperature=0.1,
         return_dict_in_generate=True,
         eos_token_id=tokenizer.eos_token_id,
         pad_token_id=tokenizer.pad_token_id,
@@ -103,33 +112,34 @@ with torch.autocast("cuda", dtype=torch.bfloat16):
         no_repeat_ngram_size=5,
     )
-print("\n\n*** Generate:")
-print(tokenizer.decode(output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True))
 ```
 <details>
 <summary>Output</summary>
-**Prompt**: Write a short email inviting my friends to a dinner party on Friday. Respond succinctly.
-**Generation**: The invitation should be brief and to-the-point, so it's best to use simple language and avoid unnecessary details or long explanations. Here is an example of a concise invitation:
-Dear Friends,
-I hope you can join me for a fun evening at my place this Friday! We'll have delicious food, great conversation, and maybe even some games if we feel like it. Please RSVP by Wednesday night so I know who will be there.
-Looking forward to seeing you all soon!
-Best regards,
-Your Name
-This message clearly communicates the essential information about the event while maintaining a friendly tone. It also includes a specific date (Friday) and timeframe (evening), as well as a clear call to action (RSVP). The closing line adds a personal touch and expresses excitement for the gathering. Overall, this invitation strikes a good balance between being informative and engaging without overwhelming the reader with too much text.
-Remember, when writing emails, always keep in mind your audience and their preferences. If they prefer more detailed information or additional context, adjust accordingly. However, try not to make the invitation overly complicated or lengthy – simplicity often makes for a better experience. Happy emailing!
 </details>
 ## Speeds, Sizes, Times
 | runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
@@ -138,17 +148,13 @@ Remember, when writing emails, always keep in mind your audience and their prefe
 ## Training
-It took ~1 hour to train 1 epoch on 1x A100.
 Prompt format:
-This model uses the [ChatML](https://huggingface.co/docs/transformers/chat_templating#what-template-should-i-use) prompt format, which was developed by OpenAI.
 ```
-<|im_start|>system
-You are a helpful assistant.<|im_end|>
-<|im_start|>user
-{prompt}<|im_end|>
-<|im_start|>assistant
 ```
 ## Training Hyperparameters

 ## Model Details
+This model was built via parameter-efficient finetuning of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab.
 - **Developed by:** Daniel Furman
 - **Model type:** Decoder-only
 ## Basic Usage
 ```python
 !pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece
 ```
 ```python
+peft_model_id = "dfurman/Yi-6B-instruct-v0.1"
 config = PeftConfig.from_pretrained(peft_model_id)
+tokenizer = AutoTokenizer.from_pretrained(
+    peft_model_id,
+    use_fast=True,
+    trust_remote_code=True,
+)
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
 model = AutoModelForCausalLM.from_pretrained(
     config.base_model_name_or_path,
+    quantization_config=bnb_config,
     device_map="auto",
     trust_remote_code=True,
 )
+model = PeftModel.from_pretrained(
+    model,
+    peft_model_id
 )
 ```
 ```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant. Respond as briefly as possible."},
+    {"role": "user", "content": "Tell me a recipe for a mai tai."},
+]
+print("\n\n*** Prompt:")
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+print(prompt)
+print("\n\n*** Generate:")
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
 with torch.autocast("cuda", dtype=torch.bfloat16):
     output = model.generate(
         input_ids=input_ids,
+        max_new_tokens=1024,
         do_sample=True,
+        temperature=0.7,
         return_dict_in_generate=True,
         eos_token_id=tokenizer.eos_token_id,
         pad_token_id=tokenizer.pad_token_id,
         no_repeat_ngram_size=5,
     )
+response = tokenizer.decode(
+    output["sequences"][0][len(input_ids[0]):],
+    skip_special_tokens=True
+)
+print(response)
 ```
 <details>
 <summary>Output</summary>
+**Prompt**: <|im_start|>system
+You are a helpful assistant. Respond as briefly as possible.<|im_end|>
+<|im_start|>user
+Tell me a recipe for a mai tai.<|im_end|>
+<|im_start|>assistant
+**Generation**: Here's one simple version of the classic Mai Tai cocktail:
+1 oz White Rum (Bacardi, Don Papa, etc.) ➕ ½ oz Coconut Cream Liqueur (Malibu or Coco Lopez)
+2 tsp Simple Syrup ➕ Dash Orange Bitters
+3-4 Ice Cubes
+Shake all ingredients in a shaker filled with ice until well chilled and strain into an old fashioned glass over fresh crushed ice. Garnish with mint leaves if desired. Enjoy!
 </details>
 ## Speeds, Sizes, Times
 | runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
 ## Training
+It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).
 Prompt format:
+This model uses the same prompt format as [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). This model does **not** expect a system prompt.
 ```
+[INST] {prompt} [/INST]
 ```
 ## Training Hyperparameters