Llama.cpp imatrix quantizations of LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

Using llama.cpp commit 5783575 for quantization.

All quants were made using the imatrix option and Bartowski's calibration file.


Perplexity table (the lower the better)

Quant Size (MB) PPL Size (%) Accuracy (%) PPL error rate
IQ1_S 693 80.4634 13.61 12.16 1.33
IQ1_M 734 39.7732 14.41 24.60 0.61
IQ2_XXS 803 20.3081 15.77 48.18 0.30
IQ2_XS 864 15.7232 16.97 62.23 0.23
IQ2_S 921 14.1473 18.09 69.16 0.21
IQ2_M 976 12.5527 19.17 77.95 0.18
Q2_K_S 985 13.7018 19.34 71.41 0.20
Q2_K 1045 12.5399 20.52 78.03 0.19
IQ3_XXS 1068 11.1884 20.97 87.45 0.16
IQ3_XS 1152 10.8551 22.62 90.14 0.16
Q3_K_S 1195 11.0653 23.47 88.43 0.16
IQ3_S 1201 10.6916 23.59 91.51 0.15
IQ3_M 1233 10.6124 24.21 92.20 0.15
Q3_K_M 1298 10.3392 25.49 94.63 0.15
Q3_K_L 1391 10.2274 27.32 95.67 0.15
IQ4_XS 1435 10.0262 28.18 97.59 0.15
Q4_0 1503 10.1964 29.52 95.96 0.15
IQ4_NL 1505 9.9962 29.56 97.88 0.15
Q4_K_S 1507 10.0445 29.59 97.41 0.15
Q4_K_M 1568 10.0122 30.79 97.72 0.15
Q4_1 1643 10.0464 32.27 97.39 0.15
Q5_K_S 1786 9.8232 35.07 99.61 0.14
Q5_0 1789 9.8700 35.13 99.13 0.14
Q5_K_M 1822 9.8565 35.78 99.27 0.14
Q5_1 1929 9.8203 37.88 99.64 0.14
Q6_K 2091 9.8229 41.06 99.61 0.14
Q8_0 2707 9.7928 53.16 99.92 0.14
F16 5092 9.7845 100 100 0.14


EXAONE-3.5-2.4B-Instruct

Introduction

We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) 2.4B model optimized for deployment on small or resource-constrained devices, 2) 7.8B model matching the size of its predecessor but offering improved performance, and 3) 32B model delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.

For more details, please refer to our technical report, blog and GitHub.

This repository contains the instruction-tuned 2.4B language model with the following features:

  • Number of Parameters (without embeddings): 2.14B
  • Number of Layers: 30
  • Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads
  • Vocab Size: 102,400
  • Context Length: 32,768 tokens
  • Tie Word Embeddings: True (unlike 7.8B and 32B models)

Quickstart

We recommend to use transformers v4.43 or later.

Here is the code snippet to run conversational inference with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Choose your prompt
prompt = "Explain how wonderful you are"  # English example
prompt = "스스로를 자랑해 봐"       # Korean example

messages = [
    {"role": "system", 
     "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to("cuda"),
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

Note

The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, so we highly recommend using the system prompts provided in the code snippet above.

Evaluation

The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the technical report.

Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor
EXAONE 3.5 2.4B 7.81 33.0 48.2 37.1 73.6 7.24 8.51
Qwen 2.5 3B 7.21 25.7 26.4 17.4 60.8 5.68 5.21
Qwen 2.5 1.5B 5.72 19.2 10.6 8.4 40.7 3.87 3.60
Llama 3.2 3B 6.94 24.0 14.2 18.7 70.1 3.16 2.86
Gemma 2 2B 7.20 20.0 19.1 29.1 50.5 4.83 5.29
  • [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details.

Deployment

EXAONE 3.5 models can be inferred in the various frameworks, such as:

  • TensorRT-LLM
  • vLLM
  • SGLang
  • llama.cpp
  • Ollama

Please refer to our EXAONE 3.5 GitHub for more details about the inference frameworks.

Quantization

We provide the pre-quantized EXAONE 3.5 models with AWQ and several quantization types in GGUF format. Please refer to our EXAONE 3.5 collection to find corresponding quantized models.

Limitation

The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.

  • Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
  • Biased responses may be generated, which are associated with age, gender, race, and so on.
  • The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
  • Since the model does not reflect the latest information, the responses may be false or contradictory.

LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models.

License

The model is licensed under EXAONE AI Model License Agreement 1.1 - NC

Citation

@article{exaone-3.5,
  title={EXAONE 3.5: Series of Large Language Models for Real-world Use Cases},
  author={LG AI Research},
  journal={arXiv preprint arXiv:https://arxiv.org/abs/2412.04862},
  year={2024}
}

Contact

LG AI Research Technical Support: [email protected]

Downloads last month
1,080
GGUF
Model size
2.67B params
Architecture
exaone

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including ThomasBaruzier/EXAONE-3.5-2.4B-Instruct-GGUF