|
--- |
|
language: |
|
- en |
|
license: llama2 |
|
model-index: |
|
- name: recycled-wizardlm-7b-v2.0 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 54.95 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 77.85 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 45.79 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 48.29 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 71.51 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 12.36 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
name: Open LLM Leaderboard |
|
--- |
|
# Model Card for umd-zhou-lab/recycled-wizardlm-7b-v2.0 |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model is trained by fine-tuning llama-2 with recycled WizardLM(70k) data V2. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
- **Developed by:** UMD Tianyi Zhou Lab |
|
- **Model type:** An auto-regressive language model based on the transformer architecture |
|
- **License:** Llama 2 Community License Agreement |
|
- **Finetuned from model:** [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b) |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **GitHub:** [Reflection-Tuning](https://github.com/tianyi-lab/Reflection_Tuning) |
|
- **Paper:** [Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning](https://arxiv.org/abs/2310.11716) |
|
- **Data:** Coming soon |
|
|
|
## Uses |
|
|
|
The primary use of this model is research on large language models and chatbots. |
|
The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence. |
|
|
|
## Training |
|
|
|
We use the prompt from [FastChat](https://github.com/lm-sys/FastChat): |
|
|
|
``` |
|
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am ...</s>...... |
|
``` |
|
|
|
| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay | Warmup Rate | |
|
| --- | ---: | ---: | ---: | ---: | ---: | ---: | |
|
| Recycled Models (7B) | 128 | 2e-5 | 3 | 2048 | 0 | 0.03 | |
|
|
|
## Performance |
|
|
|
The following table provides a comparison between our recycled models (V2) and baseline models on the AlpacaEval Leaderboard and Huggingface Open LLM Leaderboard. <br> |
|
|
|
The V2 Recycled Alpaca Data and WizardLM data, and the corresponding paper will be released soon. |
|
|
|
| | **AlpacaEval** || **Avg** | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** || **Model**| |
|
|--------------------------|:--------------:|:-:|:-----------:|:-------:|:-------------:|:-------:|:--------------:|:-:|:-:| |
|
| **Alpaca 7B** | 26.46 || 50.21 | 42.65 | 76.91 | 41.73 | 39.55 ||/| |
|
| **Recycled Alpaca 7B V2.0** | 79.58 || 56.05 | 54.01 | 78.07 | 46.69 | 45.41 ||[[hf-Link]](https://huggingface.co/umd-zhou-lab/recycled-alpaca-7b-v2.0)| |
|
||||||||||| |
|
| **WizardLM 7B** | 67.64 || 54.18 | 51.60 | 77.70 | 42.70 | 44.70 ||/| |
|
| **Recycled WizardLM 7B V2.0** | 83.48 || 56.79 | 54.78 | 77.86 | 45.63 | 48.91 ||[[hf-Link]](https://huggingface.co/umd-zhou-lab/recycled-wizardlm-7b-v2.0)| |
|
||||||||| |
|
|
|
|
|
## Citation |
|
|
|
Please consider citing our paper if you think our codes, data, or models are useful. Thank you! |
|
``` |
|
@misc{li2023reflectiontuning, |
|
title={Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning}, |
|
author={Ming Li and Lichang Chen and Jiuhai Chen and Shwai He and Heng Huang and Jiuxiang Gu and Tianyi Zhou}, |
|
year={2023}, |
|
eprint={2310.11716}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_umd-zhou-lab__recycled-wizardlm-7b-v2.0) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |51.79| |
|
|AI2 Reasoning Challenge (25-Shot)|54.95| |
|
|HellaSwag (10-Shot) |77.85| |
|
|MMLU (5-Shot) |45.79| |
|
|TruthfulQA (0-shot) |48.29| |
|
|Winogrande (5-shot) |71.51| |
|
|GSM8k (5-shot) |12.36| |
|
|
|
|