YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
gemma-2b-zephyr-dpo - AWQ
- Model creator: https://huggingface.co/Columbia-NLP/
- Original model: https://huggingface.co/Columbia-NLP/gemma-2b-zephyr-dpo/
Original model description:
license: other tags: - alignment-handbook - trl - dpo - generated_from_trainer datasets: - argilla/dpo-mix-7k license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: Columbia-NLP/gemma-2b-zephyr-sft model-index: - name: gemma-2b-zephyr-dpo results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 52.22 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 73.11 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 42.55 name: accuracy - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 42.64 - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 64.4 name: accuracy - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 19.94 name: accuracy
Model Card for Gemma 2B Zephyr DPO
We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k
.
We carefully selected the hyper-parameters to achieve the best DPO performance.
Model description
- Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
- Language(s) (NLP): Primarily English
- License: Gemma Terms of Use
- Finetuned from model: google/gemma-2b
License
This model has the same license as the original Gemma model collection
OpenLLM Leaderboard Performance
Models | Avg. | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k |
---|---|---|---|---|---|---|---|
google/gemma-2b | 46.37 | 48.38 | 71.77 | 41.77 | 33.08 | 66.77 | 16.91 |
google/gemma-2b-it | 42.75 | 43.94 | 62.70 | 37.65 | 45.82 | 60.93 | 5.46 |
wandb/gemma-2b-zephyr-sft | 47.18 | 49.74 | 72.38 | 41.37 | 34.42 | 66.93 | 18.27 |
wandb/gemma-2b-zephyr-dpo | 46.92 | 49.66 | 72.23 | 41.13 | 34.47 | 66.54 | 17.51 |
Columbia-NLP/gemma-2b-zephyr-sft | 48.75 | 51.80 | 72.63 | 42.20 | 41.96 | 63.85 | 20.09 |
Columbia-NLP/gemma-2b-zephyr-dpo | 49.14 | 52.22 | 73.11 | 42.55 | 42.64 | 64.40 | 19.94 |
MT-Bench
We evaluate our model with GPT-4-0125-preview
as the judge.
Model | Total | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
---|---|---|---|---|---|---|---|---|---|
google/gemma-2b-it | 4.71 | 2.95 | 4.35 | 6.15 | 2.90 | 3.50 | 5.60 | 5.50 | 6.70 |
wandb/gemma-2b-zephyr-sft | 4.03 | 3.10 | 3.15 | 5.00 | 2.70 | 2.65 | 5.10 | 4.80 | 5.75 |
wandb/gemma-2b-zephyr-dpo | 4.06 | 2.80 | 2.90 | 5.55 | 2.65 | 2.70 | 5.20 | 4.80 | 5.85 |
anakin87_gemma-2b-orpo | 4.14 | 3.00 | 3.70 | 6.30 | 2.70 | 2.35 | 5.68 | 4.75 | 4.75 |
Columbia-NLP/gemma-2b-zephyr-sft | 4.34 | 3.10 | 3.70 | 6.25 | 2.65 | 2.70 | 5.55 | 5.25 | 5.50 |
Columbia-NLP/gemma-2b-zephyr-dpo | 4.75 | 3.50 | 4.05 | 6.75 | 3.30 | 3.70 | 5.85 | 5.40 | 5.53 |
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.