metadata

license: unknown
library_name: peft
tags:
  - mistral
datasets:
  - ehartford/dolphin
  - garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1

Mistral-7B-Instruct-v0.1

General instruction-following llm finetuned from mistralai/Mistral-7B-v0.1.

Model Details

Model Description

This instruction-following llm was built via parameter-efficient QLoRA finetuning of mistralai/Mistral-7B-v0.1 on the first 5k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab.

Developed by: Daniel Furman
Model type: Decoder-only
Language(s) (NLP): English
License: Yi model license
Finetuned from model: mistralai/Mistral-7B-v0.1

Model Sources

Repository: github.com/daniel-furman/sft-demos

Evaluation

Metric	Value
MMLU (5-shot)	Coming
ARC (25-shot)	Coming
HellaSwag (10-shot)	Coming
TruthfulQA (0-shot)	Coming
Avg.	Coming

We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.

How to Get Started with the Model

Use the code below to get started with the model.

!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

import torch
from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,
    use_fast=True, 
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, peft_model_id)

format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"

query = "Write a short email inviting my friends to a dinner party on Friday. Respond succinctly."
prompt = format_template.format(query=query)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.1,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
        no_repeat_ngram_size=5,
    )

print("\n\n*** Generate:")
print(tokenizer.decode(output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True))

Output

Prompt: Write a short email inviting my friends to a dinner party on Friday. Respond succinctly.

Generation: The invitation should be brief and to-the-point, so it's best to use simple language and avoid unnecessary details or long explanations. Here is an example of a concise invitation:

Dear Friends,

I hope you can join me for a fun evening at my place this Friday! We'll have delicious food, great conversation, and maybe even some games if we feel like it. Please RSVP by Wednesday night so I know who will be there.

Looking forward to seeing you all soon!

Best regards, Your Name

This message clearly communicates the essential information about the event while maintaining a friendly tone. It also includes a specific date (Friday) and timeframe (evening), as well as a clear call to action (RSVP). The closing line adds a personal touch and expresses excitement for the gathering. Overall, this invitation strikes a good balance between being informative and engaging without overwhelming the reader with too much text.

Remember, when writing emails, always keep in mind your audience and their preferences. If they prefer more detailed information or additional context, adjust accordingly. However, try not to make the invitation overly complicated or lengthy – simplicity often makes for a better experience. Happy emailing!

Speeds, Sizes, Times

runtime / 50 tokens (sec)	GPU	attn	torch dtype	VRAM (GB)
3.1	1x A100 (40 GB SXM)	torch	fp16	13

Training

It took ~1 hour to train 1 epoch on 1x A100.

Prompt format: This model (and all my future releases) uses the ChatML prompt format, which was developed by OpenAI.

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Training Hyperparameters

We use the SFTTrainer from trl to fine-tune llms on instruction-following datasets.

The following TrainingArguments config was used:

num_train_epochs = 1
auto_find_batch_size = True
gradient_accumulation_steps = 1
optim = "paged_adamw_32bit"
save_strategy = "epoch"
learning_rate = 3e-4
lr_scheduler_type = "cosine"
warmup_ratio = 0.03
logging_strategy = "steps"
logging_steps = 25
bf16 = True

The following bitsandbytes quantization config was used:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: bfloat16

Model Card Contact

dryanfurman at gmail

Framework versions

PEFT 0.6.0.dev0