added note under En benchmarks
Browse files
README.md
CHANGED
@@ -71,6 +71,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
71 |
## Evaluation Results
|
72 |
|
73 |
### English/Code/Math Benchmarks
|
|
|
74 |
|
75 |
| Benchmark | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
|
76 |
|-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|
|
|
|
71 |
## Evaluation Results
|
72 |
|
73 |
### English/Code/Math Benchmarks
|
74 |
+
We use the LM Evaluation Harness to evaluate our model on the En benchmarks tasks. Please note that at the time of writing this report, we were unable to use the evaluation framework for llama-3.3-70B, Gemini-1.5-flash and GPT-4o. We currency report the available published numbers for these models. We realise that the prompt templates and few-shot settings might vary and are working to make these evaluations consistent.
|
75 |
|
76 |
| Benchmark | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
|
77 |
|-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|
|