krutrim-ai-labs
/

Krutrim-2-instruct

Model card Files Files and versions Community

krutrim-admin commited on 9 days ago

Commit

5f656f6

·

verified ·

1 Parent(s): 496e2ae

added note under En benchmarks

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -71,6 +71,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
 ## Evaluation Results
 ### English/Code/Math Benchmarks
 | Benchmark                                 | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B       | Gemini-1.5 Flash       | GPT-4o                 |
 |-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|

 ## Evaluation Results
 ### English/Code/Math Benchmarks
+We use the LM Evaluation Harness to evaluate our model on the En benchmarks tasks. Please note that at the time of writing this report, we were unable to use the evaluation framework for llama-3.3-70B, Gemini-1.5-flash and GPT-4o. We currency report the available published numbers for these models. We realise that the prompt templates and few-shot settings might vary and are working to make these evaluations consistent.
 | Benchmark                                 | Krutrim-1-7B | MN-12B-Instruct| Krutrim-2-12B | llama-3.3-70B       | Gemini-1.5 Flash       | GPT-4o                 |
 |-------------------------------------------|--------------|----------------|--------------------|----------------------|------------------------|-----------------------|