metadata
			license: apache-2.0
language:
  - en
  - de
  - es
  - fr
  - it
  - pt
  - pl
  - nl
  - tr
  - sv
  - cs
  - el
  - hu
  - ro
  - fi
  - uk
  - sl
  - sk
  - da
  - lt
  - lv
  - et
  - bg
  - 'no'
  - ca
  - hr
  - ga
  - mt
  - gl
  - zh
  - ru
  - ko
  - ja
  - ar
  - hi
 
	
	
		
	
	
		Model Card for EuroLLM-1.7B
	
This is the model card for the first pre-trained model of the EuroLLM series: EuroLLM-1.7B. You can also check the instruction tuned version: EuroLLM-1.7B-Instruct.
- Developed by: Unbabel, Instituto Superior Técnico, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université, University of Turku, University of Oslo
 
- Funded by: European Union.
 
- Model type: A 1.7B parameter multilingual transfomer LLM.
 
- Language(s) (NLP): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian. 
 
- License: Apache License 2.0.
 
	
		
	
	
		Model Details
	
The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset predominantly focusing on machine translation.
	
		
	
	
		Model Description
	
EuroLLM uses a standard, dense Transformer architecture:
- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
 
- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
 
- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
 
- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
 
For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
Here is a summary of the model hyper-parameters:
	
		
 | 
 | 
		
| Sequence Length | 
4,096 | 
| Number of Layers | 
24 | 
| Embedding Size | 
2,048 | 
| FFN Hidden Size | 
5,632 | 
| Number of Heads | 
16 | 
| Number of KV Heads (GQA) | 
8 | 
| Activation Function | 
SwiGLU | 
| Position Encodings | 
RoPE (\Theta=10,000) | 
| Layer Norm | 
RMSNorm | 
| Tied Embeddings | 
No | 
| Embedding Parameters | 
0.262B | 
| LM Head Parameters | 
0.262B | 
| Non-embedding Parameters | 
1.133B | 
| Total Parameters | 
1.657B | 
	
 
	
		
	
	
		Run the model
	
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "utter-project/EuroLLM-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "English: My name is EuroLLM. Portuguese:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	
		
	
	
		Results
	
	
		
	
	
		Machine Translation
	
We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with Gemma-2B and Gemma-7B (also instruction tuned on EuroBlocks).
The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.
	
		
	
	
		Flores-200
	
	
		
| Model | 
AVG | 
AVG en-xx | 
AVG xx-en | 
en-ar | 
en-bg | 
en-ca | 
en-cs | 
en-da | 
en-de | 
en-el | 
en-es-latam | 
en-et | 
en-fi | 
en-fr | 
en-ga | 
en-gl | 
en-hi | 
en-hr | 
en-hu | 
en-it | 
en-ja | 
en-ko | 
en-lt | 
en-lv | 
en-mt | 
en-nl | 
en-no | 
en-pl | 
en-pt-br | 
en-ro | 
en-ru | 
en-sk | 
en-sl | 
en-sv | 
en-tr | 
en-uk | 
en-zh-cn | 
ar-en | 
bg-en | 
ca-en | 
cs-en | 
da-en | 
de-en | 
el-en | 
es-latam-en | 
et-en | 
fi-en | 
fr-en | 
ga-en | 
gl-en | 
hi-en | 
hr-en | 
hu-en | 
it-en | 
ja-en | 
ko-en | 
lt-en | 
lv-en | 
mt-en | 
nl-en | 
no-en | 
pl-en | 
pt-br-en | 
ro-en | 
ru-en | 
sk-en | 
sl-en | 
sv-en | 
tr-en | 
uk-en | 
zh-cn-en | 
		
| EuroLLM-1.7B-Instruct | 
86.10 | 
85.53 | 
86.67 | 
83.87 | 
88.36 | 
84.42 | 
88.34 | 
88.77 | 
86.63 | 
86.71 | 
85.99 | 
86.98 | 
87.13 | 
87.21 | 
72.25 | 
85.97 | 
74.78 | 
82.96 | 
85.51 | 
87.77 | 
89.26 | 
86.27 | 
86.31 | 
86.22 | 
67.38 | 
86.95 | 
88.68 | 
87.38 | 
89.13 | 
88.39 | 
87.47 | 
87.51 | 
85.32 | 
89.20 | 
86.24 | 
86.33 | 
86.17 | 
85.80 | 
87.20 | 
87.53 | 
87.53 | 
89.26 | 
88.71 | 
86.49 | 
86.55 | 
87.60 | 
88.17 | 
88.90 | 
79.89 | 
87.59 | 
87.53 | 
86.10 | 
86.34 | 
87.54 | 
86.25 | 
86.08 | 
85.03 | 
85.60 | 
78.16 | 
86.80 | 
89.96 | 
85.24 | 
88.85 | 
88.42 | 
85.86 | 
87.17 | 
86.36 | 
89.48 | 
86.76 | 
86.06 | 
85.88 | 
| Gemma-2B-EuroBlocks | 
81.56 | 
78.93 | 
84.18 | 
75.25 | 
82.46 | 
83.17 | 
82.17 | 
84.40 | 
83.20 | 
79.63 | 
84.15 | 
72.63 | 
81.00 | 
85.12 | 
38.79 | 
82.00 | 
67.00 | 
81.18 | 
78.24 | 
84.80 | 
87.08 | 
82.04 | 
73.02 | 
68.41 | 
56.67 | 
83.30 | 
86.69 | 
83.07 | 
86.82 | 
84.00 | 
84.55 | 
77.93 | 
76.19 | 
80.77 | 
79.76 | 
84.19 | 
84.10 | 
83.67 | 
85.73 | 
86.89 | 
86.38 | 
88.39 | 
88.11 | 
84.68 | 
86.11 | 
83.45 | 
86.45 | 
88.22 | 
50.88 | 
86.44 | 
85.87 | 
85.33 | 
85.16 | 
86.75 | 
85.62 | 
85.00 | 
81.55 | 
81.45 | 
67.90 | 
85.95 | 
89.05 | 
84.18 | 
88.27 | 
87.38 | 
85.13 | 
85.22 | 
83.86 | 
87.83 | 
84.96 | 
85.15 | 
85.10 | 
| Gemma-7B-EuroBlocks | 
86.16 | 
85.49 | 
86.82 | 
83.39 | 
88.32 | 
85.82 | 
88.88 | 
89.01 | 
86.96 | 
86.62 | 
86.31 | 
84.42 | 
88.11 | 
87.46 | 
61.85 | 
86.10 | 
77.91 | 
87.01 | 
85.81 | 
87.57 | 
89.88 | 
87.24 | 
84.47 | 
83.15 | 
67.13 | 
86.50 | 
90.44 | 
87.57 | 
89.22 | 
89.13 | 
88.58 | 
86.73 | 
84.68 | 
88.16 | 
86.87 | 
88.40 | 
87.11 | 
86.65 | 
87.25 | 
88.17 | 
87.47 | 
89.59 | 
88.44 | 
86.76 | 
86.66 | 
87.55 | 
88.88 | 
88.86 | 
73.46 | 
87.63 | 
88.43 | 
87.12 | 
87.31 | 
87.49 | 
87.20 | 
87.15 | 
85.16 | 
85.96 | 
78.39 | 
86.73 | 
90.52 | 
85.38 | 
89.17 | 
88.75 | 
86.35 | 
86.82 | 
86.21 | 
89.39 | 
88.20 | 
86.45 | 
86.28 | 
	
 
	
		
	
	
		WMT-23
	
	
		
| Model | 
AVG | 
AVG en-xx | 
AVG xx-en | 
AVG xx-xx | 
en-de | 
en-cs | 
en-uk | 
en-ru | 
en-zh-cn | 
de-en | 
uk-en | 
ru-en | 
zh-cn-en | 
cs-uk | 
		
| EuroLLM-1.7B-Instruct | 
82.56 | 
82.30 | 
82.07 | 
85.81 | 
80.99 | 
84.42 | 
80.74 | 
81.94 | 
83.42 | 
83.74 | 
85.06 | 
81.00 | 
78.49 | 
85.81 | 
| Gemma-2B-EuroBlocks | 
79.86 | 
78.35 | 
81.32 | 
81.56 | 
76.54 | 
76.35 | 
77.62 | 
78.88 | 
82.36 | 
82.85 | 
83.83 | 
80.17 | 
78.42 | 
81.56 | 
| Gemma-7B-EuroBlocks | 
83.90 | 
83.70 | 
83.21 | 
87.61 | 
82.15 | 
84.68 | 
83.05 | 
83.85 | 
84.79 | 
84.40 | 
85.86 | 
82.55 | 
80.01 | 
87.61 | 
	
 
	
		
	
	
		WMT-24
	
	
		
| Model | 
AVG | 
AVG en-xx | 
AVG xx-xx | 
en-es-latam | 
en-cs | 
en-ru | 
en-uk | 
en-ja | 
en-zh-cn | 
en-hi | 
cs-uk | 
ja-zh-cn | 
		
| EuroLLM-1.7B-Instruct | 
78.45 | 
78.65 | 
77.67 | 
79.05 | 
80.93 | 
80.33 | 
78.05 | 
78.72 | 
81.87 | 
80.15 | 
70.10 | 
82.65 | 
| Gemma-2B-EuroBlocks | 
74.71 | 
74.25 | 
76.57 | 
75.21 | 
78.84 | 
70.40 | 
74.44 | 
75.55 | 
78.32 | 
78.70 | 
62.51 | 
79.97 | 
| Gemma-7B-EuroBlocks | 
80.88 | 
80.45 | 
82.60 | 
80.43 | 
81.91 | 
80.14 | 
80.32 | 
82.17 | 
84.08 | 
81.86 | 
72.71 | 
85.55 | 
	
 
	
		
	
	
		General Benchmarks
	
We also compare EuroLLM-1.7B with TinyLlama-1.1-3T and Gemma-2B on 3 general benchmarks: Arc Challenge, Hellaswag, and MMLU.
For the non-english languages we use the Okapi datasets.
Results show that EuroLLM-1.7B is superior to TinyLlama-1.1-3T and similar to Gemma-2B on Hellaswag but worse on Arc Challenge and MMLU. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).
	
		
	
	
		Arc Challenge
	
	
		
| Model | 
Average | 
English | 
German | 
Spanish | 
French | 
Italian | 
Portuguese | 
Chinese | 
Russian | 
Dutch | 
Arabic | 
Swedish | 
Hindi | 
Hungarian | 
Romanian | 
Ukrainian | 
Danish | 
Catalan | 
		
| EuroLLM-1.7B | 
0.3130 | 
0.4215 | 
0.3148 | 
0.3376 | 
0.3259 | 
0.3396 | 
0.3410 | 
0.3068 | 
0.2626 | 
0.3037 | 
0.2652 | 
0.3279 | 
0.2688 | 
0.3039 | 
0.3085 | 
0.2943 | 
0.2956 | 
0.3027 | 
| TinyLlama-1.1-3T | 
0.2621 | 
0.3473 | 
0.2541 | 
0.2726 | 
0.2797 | 
0.2643 | 
0.2829 | 
0.2573 | 
0.2421 | 
0.2404 | 
0.2335 | 
0.2661 | 
0.2337 | 
0.244 | 
0.2536 | 
0.2626 | 
0.2476 | 
0.2736 | 
| Gemma-2B | 
0.3617 | 
0.4846 | 
0.3755 | 
0.3940 | 
0.4080 | 
0.3687 | 
0.3872 | 
0.3726 | 
0.3456 | 
0.3328 | 
0.3122 | 
0.3519 | 
0.2851 | 
0.3039 | 
0.3590 | 
0.3601 | 
0.3565 | 
0.3516 | 
	
 
	
		
	
	
		Hellaswag
	
	
		
| Model | 
Average | 
English | 
German | 
Spanish | 
French | 
Italian | 
Portuguese | 
Russian | 
Dutch | 
Arabic | 
Swedish | 
Hindi | 
Hungarian | 
Romanian | 
Ukrainian | 
Danish | 
Catalan | 
		
| EuroLLM-1.7B | 
0.4653 | 
0.6199 | 
0.4653 | 
0.5187 | 
0.5173 | 
0.5024 | 
0.5116 | 
0.4582 | 
0.4821 | 
0.3939 | 
0.4722 | 
0.3505 | 
0.3970 | 
0.4441 | 
0.4224 | 
0.4556 | 
0.4329 | 
| TinyLlama-1.1-3T | 
0.3710 | 
0.6027 | 
0.3652 | 
0.4136 | 
0.4104 | 
0.3780 | 
0.4008 | 
0.3544 | 
0.3637 | 
0.2981 | 
0.3569 | 
0.2904 | 
0.3147 | 
0.3337 | 
0.3440 | 
0.3464 | 
0.3628 | 
| Gemma-2B | 
0.4666 | 
0.7165 | 
0.4756 | 
0.5414 | 
0.5180 | 
0.4841 | 
0.5081 | 
0.4664 | 
0.4655 | 
0.3868 | 
0.4383 | 
0.3413 | 
0.3710 | 
0.4316 | 
0.4291 | 
0.4471 | 
0.4448 | 
	
 
	
		
	
	
		MMLU
	
	
		
| Model | 
Average | 
English | 
German | 
Spanish | 
French | 
Italian | 
Portuguese | 
Chinese | 
Russian | 
Dutch | 
Arabic | 
Swedish | 
Hindi | 
Hungarian | 
Romanian | 
Ukrainian | 
Danish | 
Catalan | 
		
| EuroLLM-1.7B | 
0.2631 | 
0.2553 | 
0.2626 | 
0.2653 | 
0.2589 | 
0.2628 | 
0.2634 | 
0.2546 | 
0.2626 | 
0.2677 | 
0.2608 | 
0.2656 | 
0.2690 | 
0.2551 | 
0.2677 | 
0.2655 | 
0.2675 | 
0.2689 | 
| TinyLlama-1.1-3T | 
0.2546 | 
0.2604 | 
0.2498 | 
0.2528 | 
0.2535 | 
0.2531 | 
0.2511 | 
0.2629 | 
0.2541 | 
0.2521 | 
0.2591 | 
0.2528 | 
0.2550 | 
0.2566 | 
0.2548 | 
0.2651 | 
0.2419 | 
0.2528 | 
| Gemma-2B | 
0.3356 | 
0.4168 | 
0.3519 | 
0.3475 | 
0.3463 | 
0.3433 | 
0.3383 | 
0.3345 | 
0.3261 | 
0.3429 | 
0.3158 | 
0.3318 | 
0.2842 | 
0.3185 | 
0.3243 | 
0.3152 | 
0.3377 | 
0.3307 |