license: unknown
library_name: peft
tags:
- mistral
datasets:
- ehartford/dolphin
- garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
Mistral-7B-Instruct-v0.1
General instruction-following llm finetuned from mistralai/Mistral-7B-v0.1.
Model Details
Model Description
This instruction-following llm was built via parameter-efficient QLoRA finetuning of mistralai/Mistral-7B-v0.1 on the first 5k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab.
- Developed by: Daniel Furman
- Model type: Decoder-only
- Language(s) (NLP): English
- License: Yi model license
- Finetuned from model: mistralai/Mistral-7B-v0.1
Model Sources
- Repository: github.com/daniel-furman/sft-demos
Evaluation
Metric | Value |
---|---|
MMLU (5-shot) | Coming |
ARC (25-shot) | Coming |
HellaSwag (10-shot) | Coming |
TruthfulQA (0-shot) | Coming |
Avg. | Coming |
We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.
How to Get Started with the Model
Use the code below to get started with the model.
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece
import torch
from peft import PeftModel, PeftConfig
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
config.base_model_name_or_path,
use_fast=True,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, peft_model_id)
format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"
query = "Write a short email inviting my friends to a dinner party on Friday. Respond succinctly."
prompt = format_template.format(query=query)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
output = model.generate(
input_ids=input_ids,
max_new_tokens=512,
do_sample=True,
temperature=0.1,
return_dict_in_generate=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
repetition_penalty=1.2,
no_repeat_ngram_size=5,
)
print("\n\n*** Generate:")
print(tokenizer.decode(output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True))
Output
Prompt: Write a short email inviting my friends to a dinner party on Friday. Respond succinctly.
Generation: The invitation should be brief and to-the-point, so it's best to use simple language and avoid unnecessary details or long explanations. Here is an example of a concise invitation:
Dear Friends,
I hope you can join me for a fun evening at my place this Friday! We'll have delicious food, great conversation, and maybe even some games if we feel like it. Please RSVP by Wednesday night so I know who will be there.
Looking forward to seeing you all soon!
Best regards, Your Name
This message clearly communicates the essential information about the event while maintaining a friendly tone. It also includes a specific date (Friday) and timeframe (evening), as well as a clear call to action (RSVP). The closing line adds a personal touch and expresses excitement for the gathering. Overall, this invitation strikes a good balance between being informative and engaging without overwhelming the reader with too much text.
Remember, when writing emails, always keep in mind your audience and their preferences. If they prefer more detailed information or additional context, adjust accordingly. However, try not to make the invitation overly complicated or lengthy – simplicity often makes for a better experience. Happy emailing!
Speeds, Sizes, Times
runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
---|---|---|---|---|
3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
Training
It took ~1 hour to train 1 epoch on 1x A100.
Prompt format: This model (and all my future releases) uses the ChatML prompt format, which was developed by OpenAI.
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Training Hyperparameters
We use the SFTTrainer from trl
to fine-tune llms on instruction-following datasets.
The following TrainingArguments
config was used:
- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True
The following bitsandbytes
quantization config was used:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16
Model Card Contact
dryanfurman at gmail
Framework versions
- PEFT 0.6.0.dev0