Update README.md
Browse files
README.md
CHANGED
@@ -41,15 +41,15 @@ In the below image and corresponding table, you can see the benchmark scores for
|
|
41 |
| Model | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU | MMLU-DE | mean |
|
42 |
|----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
|
43 |
| meta-llama/Meta-Llama-3-8B-Instruct | 0.47498 | 0.43923 | **0.59642** | 0.47952 | **0.82025** | 0.60008 | **0.66658** | 0.53541 | 0.57656 |
|
44 |
-
| DiscoResearch/
|
45 |
-
| DiscoResearch/
|
46 |
-
| DiscoResearch/
|
47 |
-
| **DiscoResearch/
|
48 |
|
49 |
## Model Configurations
|
50 |
|
51 |
We release DiscoLeo-8B in the following configurations:
|
52 |
-
1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/
|
53 |
2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
|
54 |
3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
|
55 |
4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
|
@@ -62,11 +62,11 @@ Here's how to use the model with transformers:
|
|
62 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
63 |
|
64 |
model = AutoModelForCausalLM.from_pretrained(
|
65 |
-
"DiscoResearch/
|
66 |
torch_dtype="auto",
|
67 |
device_map="auto"
|
68 |
)
|
69 |
-
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/
|
70 |
|
71 |
prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
|
72 |
messages = [
|
|
|
41 |
| Model | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU | MMLU-DE | mean |
|
42 |
|----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
|
43 |
| meta-llama/Meta-Llama-3-8B-Instruct | 0.47498 | 0.43923 | **0.59642** | 0.47952 | **0.82025** | 0.60008 | **0.66658** | 0.53541 | 0.57656 |
|
44 |
+
| DiscoResearch/Llama3-German-8B | 0.49499 | 0.44838 | 0.55802 | 0.49829 | 0.79924 | 0.65395 | 0.62240 | 0.54413 | 0.57743 |
|
45 |
+
| DiscoResearch/Llama3-German-8B-32k | 0.48920 | 0.45138 | 0.54437 | 0.49232 | 0.79078 | 0.64310 | 0.58774 | 0.47971 | 0.55982 |
|
46 |
+
| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 | **0.53042** | 0.52867 | 0.59556 | **0.53839** | 0.80721 | 0.66440 | 0.61898 | 0.56053 | **0.60552** |
|
47 |
+
| **DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1** | 0.52749 | **0.53245** | 0.58788 | 0.53754 | 0.80770 | **0.66709** | 0.62123 | **0.56238** | 0.60547 |
|
48 |
|
49 |
## Model Configurations
|
50 |
|
51 |
We release DiscoLeo-8B in the following configurations:
|
52 |
+
1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3-German_8B)
|
53 |
2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
|
54 |
3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
|
55 |
4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
|
|
|
62 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
63 |
|
64 |
model = AutoModelForCausalLM.from_pretrained(
|
65 |
+
"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
|
66 |
torch_dtype="auto",
|
67 |
device_map="auto"
|
68 |
)
|
69 |
+
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1")
|
70 |
|
71 |
prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
|
72 |
messages = [
|