PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
5f656f6
·
verified ·
1 Parent(s): 496e2ae

added note under En benchmarks

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -71,6 +71,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
71
  ## Evaluation Results
72
 
73
  ### English/Code/Math Benchmarks
 
74
 
75
  | Benchmark | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
76
  |-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|
 
71
  ## Evaluation Results
72
 
73
  ### English/Code/Math Benchmarks
74
+ We use the LM Evaluation Harness to evaluate our model on the En benchmarks tasks. Please note that at the time of writing this report, we were unable to use the evaluation framework for llama-3.3-70B, Gemini-1.5-flash and GPT-4o. We currency report the available published numbers for these models. We realise that the prompt templates and few-shot settings might vary and are working to make these evaluations consistent.
75
 
76
  | Benchmark | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
77
  |-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|