Chocolatine-2-14B-Instruct-v2.0.3

DPO fine-tuning of the merged model jpacifico/Chocolatine-2-merged-qwen25arch (Qwen-2.5-14B architecture)
using the jpacifico/french-orca-dpo-pairs-revised RLHF dataset.
Training in French also improves the model's overall capabilities.

Window context : up to 128K tokens

OpenLLM Leaderboard

Chocolatine-2 is the best-performing 14B fine-tuned model (Ex-aequo with avg. score 41.08) on the OpenLLM Leaderboard
[Updated 2025-02-12]

Metric Value
Avg. 41.08
IFEval 70.37
BBH 50.63
MATH Lvl 5 40.56
GPQA 17.23
MuSR 19.07
MMLU-PRO 48.60

LLM Leaderboard FR

Top 3 all categories on the French Government Leaderboard LLM FR

image/png

[Updated 2025-02-15]

MT-Bench-French

Chocolatine-2 outperforms its previous versions and its base architecture Qwen-2.5 model on MT-Bench-French, used with multilingual-mt-bench and GPT-4-Turbo as a LLM-judge.
My goal was to achieve GPT-4o-mini's performance on the French language, this version comes close to the performance of the OpenAI model according to this benchmark

########## First turn ##########
                                             score
model                                 turn        
gpt-4o-mini                           1     9.287500
Chocolatine-2-14B-Instruct-v2.0.3     1     9.112500
Qwen2.5-14B-Instruct                  1     8.887500
Chocolatine-14B-Instruct-DPO-v1.2     1     8.612500
Phi-3.5-mini-instruct                 1     8.525000
Chocolatine-3B-Instruct-DPO-v1.2      1     8.375000
DeepSeek-R1-Distill-Qwen-14B          1     8.375000
phi-4                                 1     8.300000
Phi-3-medium-4k-instruct              1     8.225000
gpt-3.5-turbo                         1     8.137500
Chocolatine-3B-Instruct-DPO-Revised   1     7.987500
Meta-Llama-3.1-8B-Instruct            1     7.050000
vigostral-7b-chat                     1     6.787500
Mistral-7B-Instruct-v0.3              1     6.750000
gemma-2-2b-it                         1     6.450000

########## Second turn ##########
                                               score
model                                 turn
Chocolatine-2-14B-Instruct-v2.0.3     2     9.050000         
gpt-4o-mini                           2     8.912500
Qwen2.5-14B-Instruct                  2     8.912500
Chocolatine-14B-Instruct-DPO-v1.2     2     8.337500
DeepSeek-R1-Distill-Qwen-14B          2     8.200000
phi-4                                 2     8.131250
Chocolatine-3B-Instruct-DPO-Revised   2     7.937500
Chocolatine-3B-Instruct-DPO-v1.2      2     7.862500
Phi-3-medium-4k-instruct              2     7.750000
gpt-3.5-turbo                         2     7.679167
Phi-3.5-mini-instruct                 2     7.575000
Meta-Llama-3.1-8B-Instruct            2     6.787500
Mistral-7B-Instruct-v0.3              2     6.500000
vigostral-7b-chat                     2     6.162500
gemma-2-2b-it                         2     6.100000

########## Average ##########
                                          score
model                                          
gpt-4o-mini                            9.100000
Chocolatine-2-14B-Instruct-v2.0.3      9.081250
Qwen2.5-14B-Instruct                   8.900000
Chocolatine-14B-Instruct-DPO-v1.2      8.475000
DeepSeek-R1-Distill-Qwen-14B           8.287500
phi-4                                  8.215625
Chocolatine-3B-Instruct-DPO-v1.2       8.118750
Phi-3.5-mini-instruct                  8.050000
Phi-3-medium-4k-instruct               7.987500
Chocolatine-3B-Instruct-DPO-Revised    7.962500
gpt-3.5-turbo                          7.908333
Meta-Llama-3.1-8B-Instruct             6.918750
Mistral-7B-Instruct-v0.3               6.625000
vigostral-7b-chat                      6.475000
gemma-2-2b-it                          6.275000

Usage

You can run this model using my Colab notebook

You can also run Chocolatine-2 using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Limitations

The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.

  • Developed by: Jonathan Pacifico, 2025
  • Model type: LLM
  • Language(s) (NLP): French, English
  • License: Apache-2.0

Made with ❀️ in France

Downloads last month
598
Safetensors
Model size
14.8B params
Tensor type
FP16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for jpacifico/Chocolatine-2-14B-Instruct-v2.0.3

Finetunes
1 model
Merges
4 models
Quantizations
5 models

Dataset used to train jpacifico/Chocolatine-2-14B-Instruct-v2.0.3

Spaces using jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 6

Collection including jpacifico/Chocolatine-2-14B-Instruct-v2.0.3