Mistral-7B-Instruct-v0.1
license: apache-2.0 library_name: peft tags: - mistral datasets: - jondurbin/airoboros-2.2.1 inference: false pipeline_tag: text-generation base_model: mistralai/Mistral-7B-v0.1

Mistral-7B-Instruct-v0.1
The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. It is geared towards generalist instruction-following capabilities.
Model Details
This model was built via parameter-efficient finetuning of mistralai/Mistral-7B-v0.1 on the jondurbin/airoboros-2.2.1. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 3 hours.
- Developed by: Daniel Furman
- Model type: Decoder-only
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: mistralai/Mistral-7B-v0.1
Model Sources
- Repository: github.com/daniel-furman/sft-demos
Evaluation Results
Metric | Value |
---|---|
MMLU (5-shot) | Coming |
ARC (25-shot) | Coming |
HellaSwag (10-shot) | Coming |
TruthfulQA (0-shot) | Coming |
Avg. | Coming |
We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.
Basic Usage
Setup
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece
import torch
from peft import PeftModel, PeftConfig
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(
peft_model_id,
use_fast=True,
trust_remote_code=True,
)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
model,
peft_model_id
)
messages = [
{"role": "user", "content": "Tell me a recipe for a mai tai."},
]
print("\n\n*** Prompt:")
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
print(prompt)
print("\n\n*** Generate:")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
output = model.generate(
input_ids=input_ids,
max_new_tokens=1024,
do_sample=True,
temperature=0.7,
return_dict_in_generate=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
repetition_penalty=1.2,
no_repeat_ngram_size=5,
)
response = tokenizer.decode(
output["sequences"][0][len(input_ids[0]):],
skip_special_tokens=True
)
print(response)
Output
Prompt:
coming
Generation:
coming
Speeds, Sizes, Times
runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
---|---|---|---|---|
3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
Training
It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).
Prompt Format
This model was finetuned with the following format:
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ ' [INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"
This format is available as a chat template via the apply_chat_template()
method. Here's an illustrative example:
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)
Output
coming
Training Hyperparameters
We use the SFTTrainer from trl
to fine-tune LLMs on instruction-following datasets.
The following TrainingArguments
config was used:
- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True
The following bitsandbytes
quantization config was used:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16
Model Card Contact
dryanfurman at gmail
Framework versions
- PEFT 0.6.0.dev0