Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ license: llama2
|
|
| 14 |
|
| 15 |
## ChocoLlama-2-7B-base: Getting Started
|
| 16 |
|
| 17 |
-
We here present **ChocoLlama-2-7B-base**, a language-adapted version of Meta's Llama-2-7b, fine-tuned on
|
| 18 |
Note that this is a base model, not optimized for conversational behavior.
|
| 19 |
If this is desired for your use-case, we recommend finetuning this model on your own Dutch data or using the instruction-finetuned version of this model, [ChocoLlama-2-7B-instruct](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct).
|
| 20 |
|
|
@@ -32,7 +32,7 @@ model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
|
|
| 32 |
ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
|
| 33 |
|
| 34 |
We provide 6 variants (of which 3 base and 3 instruction-tuned models):
|
| 35 |
-
- **ChocoLlama-2-7B-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base)): A language-adapted version of Meta's Llama-2-7b, fine-tuned on
|
| 36 |
- **ChocoLlama-2-7B-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct)): An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
|
| 37 |
- **ChocoLlama-2-7B-tokentrans-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base)): A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
|
| 38 |
- **ChocoLlama-2-7B-tokentrans-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct)): An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
|
|
@@ -44,7 +44,7 @@ For benchmark results for all models, including compared to their base models an
|
|
| 44 |
### Model Description
|
| 45 |
|
| 46 |
- **Developed by:** [Matthieu Meeus](https://huggingface.co/matthieumeeus97), [Anthony Rathé](https://huggingface.co/anthonyrathe)
|
| 47 |
-
- **Funded by:** [Vlaams Supercomputer Centrum](https://www.vscentrum.be/), through a grant of apx. 40K GPU hours (NVIDIA
|
| 48 |
- **Language(s):** Dutch
|
| 49 |
- **License:** [Llama-2 Community License](https://ai.meta.com/llama/license/)
|
| 50 |
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
|
|
@@ -110,7 +110,7 @@ We collect a diverse set of Dutch natural language.
|
|
| 110 |
|
| 111 |
### Training Procedure
|
| 112 |
|
| 113 |
-
This model was fine-tuned using low-rank (LoRa) adapatation with trainable embeddings, for a total of
|
| 114 |
|
| 115 |
#### Training Hyperparameters
|
| 116 |
|
|
@@ -132,7 +132,6 @@ This model was fine-tuned using low-rank (LoRa) adapatation with trainable embed
|
|
| 132 |
- Parallelization factor: 8
|
| 133 |
- Weight decay: 0
|
| 134 |
|
| 135 |
-
|
| 136 |
## Evaluation
|
| 137 |
|
| 138 |
### Quantitative evaluation
|
|
@@ -166,4 +165,4 @@ For details, we refer to the paper and to our benchmark [ChocoLlama-Bench](https
|
|
| 166 |
|
| 167 |
### Compute Infrastructure
|
| 168 |
|
| 169 |
-
All ChocoLlama models have been trained on the compute cluster provided by the [Flemish Supercomputer Center (VSC)](https://www.vscentrum.be/). We used 8 to 16 NVIDIA
|
|
|
|
| 14 |
|
| 15 |
## ChocoLlama-2-7B-base: Getting Started
|
| 16 |
|
| 17 |
+
We here present **ChocoLlama-2-7B-base**, a language-adapted version of Meta's Llama-2-7b, fine-tuned on 32B Dutch Llama-2 tokens (104GB) using LoRa.
|
| 18 |
Note that this is a base model, not optimized for conversational behavior.
|
| 19 |
If this is desired for your use-case, we recommend finetuning this model on your own Dutch data or using the instruction-finetuned version of this model, [ChocoLlama-2-7B-instruct](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct).
|
| 20 |
|
|
|
|
| 32 |
ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
|
| 33 |
|
| 34 |
We provide 6 variants (of which 3 base and 3 instruction-tuned models):
|
| 35 |
+
- **ChocoLlama-2-7B-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base)): A language-adapted version of Meta's Llama-2-7b, fine-tuned on 32B Dutch Llama-2 tokens (104GB) using LoRa.
|
| 36 |
- **ChocoLlama-2-7B-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct)): An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
|
| 37 |
- **ChocoLlama-2-7B-tokentrans-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base)): A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
|
| 38 |
- **ChocoLlama-2-7B-tokentrans-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct)): An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
|
|
|
|
| 44 |
### Model Description
|
| 45 |
|
| 46 |
- **Developed by:** [Matthieu Meeus](https://huggingface.co/matthieumeeus97), [Anthony Rathé](https://huggingface.co/anthonyrathe)
|
| 47 |
+
- **Funded by:** [Vlaams Supercomputer Centrum](https://www.vscentrum.be/), through a grant of apx. 40K GPU hours (NVIDIA A100-80GB)
|
| 48 |
- **Language(s):** Dutch
|
| 49 |
- **License:** [Llama-2 Community License](https://ai.meta.com/llama/license/)
|
| 50 |
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
|
|
|
|
| 110 |
|
| 111 |
### Training Procedure
|
| 112 |
|
| 113 |
+
This model was fine-tuned using low-rank (LoRa) adapatation with trainable embeddings, for a total of 544M trainable parameters.
|
| 114 |
|
| 115 |
#### Training Hyperparameters
|
| 116 |
|
|
|
|
| 132 |
- Parallelization factor: 8
|
| 133 |
- Weight decay: 0
|
| 134 |
|
|
|
|
| 135 |
## Evaluation
|
| 136 |
|
| 137 |
### Quantitative evaluation
|
|
|
|
| 165 |
|
| 166 |
### Compute Infrastructure
|
| 167 |
|
| 168 |
+
All ChocoLlama models have been trained on the compute cluster provided by the [Flemish Supercomputer Center (VSC)](https://www.vscentrum.be/). We used 8 to 16 NVIDIA A100 GPU's with 80 GB of VRAM.
|