Llama.cpp imatrix quantizations of LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

Using llama.cpp commit 5783575 for quantization.

All quants were made using the imatrix option and Bartowski's calibration file.

Perplexity table (the lower the better)

Quant	Size (MB)	PPL	Size (%)	Accuracy (%)	PPL error rate
IQ1_S	693	80.4634	13.61	12.16	1.33
IQ1_M	734	39.7732	14.41	24.60	0.61
IQ2_XXS	803	20.3081	15.77	48.18	0.30
IQ2_XS	864	15.7232	16.97	62.23	0.23
IQ2_S	921	14.1473	18.09	69.16	0.21
IQ2_M	976	12.5527	19.17	77.95	0.18
Q2_K_S	985	13.7018	19.34	71.41	0.20
Q2_K	1045	12.5399	20.52	78.03	0.19
IQ3_XXS	1068	11.1884	20.97	87.45	0.16
IQ3_XS	1152	10.8551	22.62	90.14	0.16
Q3_K_S	1195	11.0653	23.47	88.43	0.16
IQ3_S	1201	10.6916	23.59	91.51	0.15
IQ3_M	1233	10.6124	24.21	92.20	0.15
Q3_K_M	1298	10.3392	25.49	94.63	0.15
Q3_K_L	1391	10.2274	27.32	95.67	0.15
IQ4_XS	1435	10.0262	28.18	97.59	0.15
Q4_0	1503	10.1964	29.52	95.96	0.15
IQ4_NL	1505	9.9962	29.56	97.88	0.15
Q4_K_S	1507	10.0445	29.59	97.41	0.15
Q4_K_M	1568	10.0122	30.79	97.72	0.15
Q4_1	1643	10.0464	32.27	97.39	0.15
Q5_K_S	1786	9.8232	35.07	99.61	0.14
Q5_0	1789	9.8700	35.13	99.13	0.14
Q5_K_M	1822	9.8565	35.78	99.27	0.14
Q5_1	1929	9.8203	37.88	99.64	0.14
Q6_K	2091	9.8229	41.06	99.61	0.14
Q8_0	2707	9.7928	53.16	99.92	0.14
F16	5092	9.7845	100	100	0.14

EXAONE-3.5-2.4B-Instruct

Introduction

We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) 2.4B model optimized for deployment on small or resource-constrained devices, 2) 7.8B model matching the size of its predecessor but offering improved performance, and 3) 32B model delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.

For more details, please refer to our technical report, blog and GitHub.

This repository contains the instruction-tuned 2.4B language model with the following features:

Number of Parameters (without embeddings): 2.14B
Number of Layers: 30
Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads
Vocab Size: 102,400
Context Length: 32,768 tokens
Tie Word Embeddings: True (unlike 7.8B and 32B models)

Quickstart

We recommend to use transformers v4.43 or later.

Here is the code snippet to run conversational inference with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Choose your prompt
prompt = "Explain how wonderful you are"  # English example
prompt = "스스로를 자랑해 봐"       # Korean example

messages = [
    {"role": "system", 
     "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to("cuda"),
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

Note

The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, so we highly recommend using the system prompts provided in the code snippet above.

Evaluation

The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the technical report.

Models	MT-Bench	LiveBench	Arena-Hard	AlpacaEval	IFEval	KoMT-Bench[1]	LogicKor
EXAONE 3.5 2.4B	7.81	33.0	48.2	37.1	73.6	7.24	8.51
Qwen 2.5 3B	7.21	25.7	26.4	17.4	60.8	5.68	5.21
Qwen 2.5 1.5B	5.72	19.2	10.6	8.4	40.7	3.87	3.60
Llama 3.2 3B	6.94	24.0	14.2	18.7	70.1	3.16	2.86
Gemma 2 2B	7.20	20.0	19.1	29.1	50.5	4.83	5.29

[1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details.

Deployment

EXAONE 3.5 models can be inferred in the various frameworks, such as:

TensorRT-LLM
vLLM
SGLang
llama.cpp
Ollama

Please refer to our EXAONE 3.5 GitHub for more details about the inference frameworks.

Quantization

We provide the pre-quantized EXAONE 3.5 models with AWQ and several quantization types in GGUF format. Please refer to our EXAONE 3.5 collection to find corresponding quantized models.

Limitation

The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.

Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
Biased responses may be generated, which are associated with age, gender, race, and so on.
The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
Since the model does not reflect the latest information, the responses may be false or contradictory.

LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models.

License

The model is licensed under EXAONE AI Model License Agreement 1.1 - NC

Citation

@article{exaone-3.5,
  title={EXAONE 3.5: Series of Large Language Models for Real-world Use Cases},
  author={LG AI Research},
  journal={arXiv preprint arXiv:https://arxiv.org/abs/2412.04862},
  year={2024}
}

Contact

LG AI Research Technical Support: [email protected]

ThomasBaruzier
/

EXAONE-3.5-2.4B-Instruct-GGUF