dfurman's picture
Update README.md
0eb132f
|
raw
history blame
5.87 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - mistral
datasets:
  - ehartford/dolphin
  - garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1

Mistral-7B-Instruct-v0.1

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. It is geared towards generalist instruction-following capabilities.

Model Details

This model was built via parameter-efficient finetuning of mistralai/Mistral-7B-Instruct-v0.1 on the first 5k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab.

  • Developed by: Daniel Furman
  • Model type: Decoder-only
  • Language(s) (NLP): English
  • License: Yi model license
  • Finetuned from model: mistralai/Mistral-7B-v0.1

Model Sources

Evaluation Results

Metric Value
MMLU (5-shot) Coming
ARC (25-shot) Coming
HellaSwag (10-shot) Coming
TruthfulQA (0-shot) Coming
Avg. Coming

We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.

Basic Usage

!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

import torch
from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
peft_model_id = "dfurman/Yi-6B-instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(
    peft_model_id,
    use_fast=True,
    trust_remote_code=True,
)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model, 
    peft_model_id
)
messages = [
    {"role": "system", "content": "You are a helpful assistant. Respond as briefly as possible."},    
    {"role": "user", "content": "Tell me a recipe for a mai tai."},
]

print("\n\n*** Prompt:")
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)

print("\n\n*** Generate:")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=1024,
        do_sample=True,
        temperature=0.7,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
        no_repeat_ngram_size=5,
    )

response = tokenizer.decode(
    output["sequences"][0][len(input_ids[0]):], 
    skip_special_tokens=True
)
print(response)
Output

Prompt: <|im_start|>system You are a helpful assistant. Respond as briefly as possible.<|im_end|> <|im_start|>user Tell me a recipe for a mai tai.<|im_end|> <|im_start|>assistant

Generation: Here's one simple version of the classic Mai Tai cocktail:

1 oz White Rum (Bacardi, Don Papa, etc.) ➕ ½ oz Coconut Cream Liqueur (Malibu or Coco Lopez) 2 tsp Simple Syrup ➕ Dash Orange Bitters 3-4 Ice Cubes

Shake all ingredients in a shaker filled with ice until well chilled and strain into an old fashioned glass over fresh crushed ice. Garnish with mint leaves if desired. Enjoy!

Speeds, Sizes, Times

runtime / 50 tokens (sec) GPU attn torch dtype VRAM (GB)
3.1 1x A100 (40 GB SXM) torch fp16 13

Training

It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).

Prompt format: This model uses the same prompt format as mistralai/Mistral-7B-Instruct-v0.1. This model does not expect a system prompt.

[INST] {prompt} [/INST]

Training Hyperparameters

We use the SFTTrainer from trl to fine-tune LLMs on instruction-following datasets.

The following TrainingArguments config was used:

  • num_train_epochs = 1
  • auto_find_batch_size = True
  • gradient_accumulation_steps = 1
  • optim = "paged_adamw_32bit"
  • save_strategy = "epoch"
  • learning_rate = 3e-4
  • lr_scheduler_type = "cosine"
  • warmup_ratio = 0.03
  • logging_strategy = "steps"
  • logging_steps = 25
  • bf16 = True

The following bitsandbytes quantization config was used:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: bfloat16

Model Card Contact

dryanfurman at gmail

Framework versions

  • PEFT 0.6.0.dev0