Original model description:

license: other tags: - alignment-handbook - trl - dpo - generated_from_trainer datasets: - argilla/dpo-mix-7k license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: Columbia-NLP/gemma-2b-zephyr-sft model-index: - name: gemma-2b-zephyr-dpo results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 52.22 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 73.11 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 42.55 name: accuracy - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 42.64 - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 64.4 name: accuracy - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 19.94 name: accuracy

Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
License: Gemma Terms of Use
Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models	Avg.	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8k
google/gemma-2b	46.37	48.38	71.77	41.77	33.08	66.77	16.91
google/gemma-2b-it	42.75	43.94	62.70	37.65	45.82	60.93	5.46
wandb/gemma-2b-zephyr-sft	47.18	49.74	72.38	41.37	34.42	66.93	18.27
wandb/gemma-2b-zephyr-dpo	46.92	49.66	72.23	41.13	34.47	66.54	17.51
Columbia-NLP/gemma-2b-zephyr-sft	48.75	51.80	72.63	42.20	41.96	63.85	20.09
Columbia-NLP/gemma-2b-zephyr-dpo	49.14	52.22	73.11	42.55	42.64	64.40	19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model	Total	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	STEM	Writing
google/gemma-2b-it	4.71	2.95	4.35	6.15	2.90	3.50	5.60	5.50	6.70
wandb/gemma-2b-zephyr-sft	4.03	3.10	3.15	5.00	2.70	2.65	5.10	4.80	5.75
wandb/gemma-2b-zephyr-dpo	4.06	2.80	2.90	5.55	2.65	2.70	5.20	4.80	5.85
anakin87_gemma-2b-orpo	4.14	3.00	3.70	6.30	2.70	2.35	5.68	4.75	4.75
Columbia-NLP/gemma-2b-zephyr-sft	4.34	3.10	3.70	6.25	2.65	2.70	5.55	5.25	5.50
Columbia-NLP/gemma-2b-zephyr-dpo	4.75	3.50	4.05	6.75	3.30	3.70	5.85	5.40	5.53