DiscoResearch
/

Llama3-DiscoLeo-Instruct-8B-32k-v0.1

@@ -41,15 +41,15 @@ In the below image and corresponding table, you can see the benchmark scores for
 | Model                                              | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag   | hellaswag_de | MMLU        | MMLU-DE     | mean        |
 |----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
 | meta-llama/Meta-Llama-3-8B-Instruct                | 0.47498        | 0.43923       | **0.59642**   | 0.47952          | **0.82025** | 0.60008      | **0.66658** | 0.53541     | 0.57656     |
-| DiscoResearch/Llama3_German_8B                     | 0.49499        | 0.44838       | 0.55802       | 0.49829          | 0.79924     | 0.65395      | 0.62240     | 0.54413     | 0.57743     |
-| DiscoResearch/Llama3_German_8B_32k                 | 0.48920        | 0.45138       | 0.54437       | 0.49232          | 0.79078     | 0.64310      | 0.58774     | 0.47971     | 0.55982     |
-| DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1     | **0.53042**    | 0.52867       | 0.59556       | **0.53839**      | 0.80721     | 0.66440      | 0.61898     | 0.56053     | **0.60552** |
-| **DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1** | 0.52749        | **0.53245**   | 0.58788       | 0.53754          | 0.80770     | **0.66709**  | 0.62123     | **0.56238** | 0.60547     |
 ## Model Configurations
 We release DiscoLeo-8B in the following configurations:
-1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3_German_8B)
 2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
 3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
 4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
@@ -62,11 +62,11 @@ Here's how to use the model with transformers:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
-    "DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1",
     torch_dtype="auto",
     device_map="auto"
 )
-tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1")
 prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
 messages = [

 | Model                                              | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag   | hellaswag_de | MMLU        | MMLU-DE     | mean        |
 |----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
 | meta-llama/Meta-Llama-3-8B-Instruct                | 0.47498        | 0.43923       | **0.59642**   | 0.47952          | **0.82025** | 0.60008      | **0.66658** | 0.53541     | 0.57656     |
+| DiscoResearch/Llama3-German-8B                     | 0.49499        | 0.44838       | 0.55802       | 0.49829          | 0.79924     | 0.65395      | 0.62240     | 0.54413     | 0.57743     |
+| DiscoResearch/Llama3-German-8B-32k                 | 0.48920        | 0.45138       | 0.54437       | 0.49232          | 0.79078     | 0.64310      | 0.58774     | 0.47971     | 0.55982     |
+| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1     | **0.53042**    | 0.52867       | 0.59556       | **0.53839**      | 0.80721     | 0.66440      | 0.61898     | 0.56053     | **0.60552** |
+| **DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1** | 0.52749        | **0.53245**   | 0.58788       | 0.53754          | 0.80770     | **0.66709**  | 0.62123     | **0.56238** | 0.60547     |
 ## Model Configurations
 We release DiscoLeo-8B in the following configurations:
+1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3-German_8B)
 2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
 3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
 4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
+    "DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
     torch_dtype="auto",
     device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1")
 prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
 messages = [