Added qualitative results
Browse files
README.md
CHANGED
@@ -15,6 +15,13 @@ license: unknown
|
|
15 |
tags:
|
16 |
- Krutrim
|
17 |
- language-model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
# Krutrim-2
|
20 |
|
@@ -93,6 +100,11 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) with
|
|
93 |
| FloresIN (1-shot, xx-en) (chrf++) | 50% | 54% | 58% |
|
94 |
| FloresIN (1-shot, en-xx) (chrf++) | 34% | 41% | 46% |
|
95 |
|
|
|
|
|
|
|
|
|
|
|
96 |
## Usage
|
97 |
To use the model, you can load it with `AutoModelForCausalLM` as follows:
|
98 |
|
|
|
15 |
tags:
|
16 |
- Krutrim
|
17 |
- language-model
|
18 |
+
widget:
|
19 |
+
- text: "Category-wise evaluation results"
|
20 |
+
output:
|
21 |
+
url: "images/cumulative_score_category.png"
|
22 |
+
- text: "Language-wise evaluation results"
|
23 |
+
output:
|
24 |
+
url: "images/cumulative_score_langauge.png"
|
25 |
---
|
26 |
# Krutrim-2
|
27 |
|
|
|
100 |
| FloresIN (1-shot, xx-en) (chrf++) | 50% | 54% | 58% |
|
101 |
| FloresIN (1-shot, en-xx) (chrf++) | 34% | 41% | 46% |
|
102 |
|
103 |
+
### Qualitative Results
|
104 |
+
Below are the results from manual evaluation of prompt-response pairs across languages and task categories. Scores are between 1-5 (higher the better). Model names were anonymised during the evaluation.
|
105 |
+
|
106 |
+
<Gallery />
|
107 |
+
|
108 |
## Usage
|
109 |
To use the model, you can load it with `AutoModelForCausalLM` as follows:
|
110 |
|