PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
ea47124
·
verified ·
1 Parent(s): eabc72d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -73,21 +73,21 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
73
 
74
  ### English/Code/Math Benchmarks
75
 
76
- | Benchmark | Krutrim-1 7B | MN-12B-Instruct | Krutrim-2 12B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
77
- |-------------------------------------------|--------------|-----------------|---------------|----------------------|------------------------|-----------------------|
78
- | Hellaswag (0-shot) - Accuracy | 0.74 | 0.82 | 0.83 | 0.95 | 0.87 (10-shot) | 0.95 (10-shot) |
79
- | Winogrande (0-shot) - Accuracy | 0.67 | 0.74 | 0.77 | 0.85 (5-shot) | - | 0.88 (5-shot) |
80
- | OpenBookQA (0-shot) - Accuracy | 0.45 | 0.46 | 0.49 | - | - | - |
81
- | CommonSenseQA (0-shot) - Accuracy | 0.74 | 0.70 | 0.74 | - | - | 0.85 |
82
- | TruthfulQA (0-shot) - Accuracy | 0.49 | 0.54 | 0.59 | - | - | 0.59 |
83
- | MMLU (5-shot) - Accuracy | 0.47 | 0.68 | 0.63 | 0.82 | 0.79 | 0.86 |
84
- | TriviaQA (5-shot) - EM | 0.44 | 0.72 | 0.62 | - | - | - |
85
- | NaturalQuestions (5-shot) - EM | 0.15 | 0.28 | 0.26 | - | - | - |
86
- | GSM8K (0-shot) - EM | 0.07 | 0.74 | 0.71 | 0.93 (8-shot, CoT) | 0.86 (11-shot) | 0.89 |
87
- | ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 | 0.60 | 0.93 (25-shot) | - | 0.50 |
88
- | ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 | 0.82 | - | - | - |
89
- | HumanEval - Pass@10 | 0.00 | 0.23 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
90
- | IF_Eval (0-shot) - Accuracy | 0.16 | - | 0.56 | 0.92 | - | 0.84 |
91
 
92
  ### Indic Benchmarks
93
 
 
73
 
74
  ### English/Code/Math Benchmarks
75
 
76
+ | Benchmark | Krutrim-1 7B | MN-12B-Instruct|Krutrim-2-base | Krutrim-2-instruct | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
77
+ |-------------------------------------------|--------------|----------------|----------------|--------------------|----------------------|------------------------|-----------------------|
78
+ | Hellaswag (0-shot) - Accuracy | 0.74 | 0.82 |0.80 | 0.83 | 0.95 | 0.87 (10-shot) | 0.95 (10-shot) |
79
+ | Winogrande (0-shot) - Accuracy | 0.67 | 0.74 |0.73 | 0.77 | 0.85 (5-shot) | - | 0.88 (5-shot) |
80
+ | OpenBookQA (0-shot) - Accuracy | 0.45 | 0.46 |0.47 | 0.49 | - | - | - |
81
+ | CommonSenseQA (0-shot) - Accuracy | 0.74 | 0.70 |0.66 | 0.74 | - | - | 0.85 |
82
+ | TruthfulQA (0-shot) - Accuracy | 0.49 | 0.54 |0.48 | 0.59 | - | - | 0.59 |
83
+ | MMLU (5-shot) - Accuracy | 0.47 | 0.68 |0.64 | 0.63 | 0.82 | 0.79 | 0.86 |
84
+ | TriviaQA (5-shot) - EM | 0.44 | 0.72 |0.66 | 0.62 | - | - | - |
85
+ | NaturalQuestions (5-shot) - EM | 0.15 | 0.28 |0.27 | 0.26 | - | - | - |
86
+ | GSM8K (0-shot) - EM | 0.07 | 0.74 |0.55 | 0.71 | 0.93 (8-shot, CoT) | 0.86 (11-shot) | 0.89 |
87
+ | ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 |0.55 | 0.60 | 0.93 (25-shot) | - | 0.50 |
88
+ | ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 |0.79 | 0.82 | - | - | - |
89
+ | HumanEval - Pass@10 | 0.00 | 0.23 |0.59 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
90
+ | IF_Eval (0-shot) - Accuracy | 0.16 | - |- | 0.56 | 0.92 | - | 0.84 |
91
 
92
  ### Indic Benchmarks
93