Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
33 |
## Key Features
|
34 |
- 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
|
35 |
- Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
|
36 |
-
-
|
37 |
- Natively multilingual delivering best-in-class performance on Indic benchmarks;
|
38 |
- Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
|
39 |
- Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
|
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
50 |
|
51 |
| Model Name | Release Date |Release Note | Reference|
|
52 |
|------------|-------------|-------------|-------------|
|
53 |
-
| Krutrim-2-Base | 2024-01-31 |
|
54 |
-
| Krutrim-2-Instruct | 2024-01-31 | Finetuned and
|
55 |
|
56 |
|
57 |
## Data Freshness
|
@@ -87,7 +87,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
87 |
| ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 |0.55 | 0.60 | 0.93 (25-shot) | - | 0.50 |
|
88 |
| ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 |0.79 | 0.82 | - | - | - |
|
89 |
| HumanEval - Pass@10 | 0.00 | 0.23 |0.59 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
|
90 |
-
| IF_Eval (0-shot) - Accuracy | 0.16 |
|
91 |
|
92 |
### Indic Benchmarks
|
93 |
|
@@ -137,10 +137,6 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
|
|
137 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
138 |
|
139 |
# Add custom chat template
|
140 |
-
tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""
|
141 |
-
|
142 |
-
print(tokenizer.get_chat_template())
|
143 |
-
|
144 |
prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
|
145 |
prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
|
146 |
inputs = tokenizer(prompt, return_tensors='pt')
|
@@ -150,7 +146,7 @@ inputs.pop("token_type_ids", None)
|
|
150 |
outputs = model.generate(
|
151 |
**inputs,
|
152 |
max_length=4096,
|
153 |
-
temperature=0.
|
154 |
top_k=50,
|
155 |
top_p=0.9,
|
156 |
repetition_penalty=1.2,
|
@@ -161,7 +157,8 @@ outputs = model.generate(
|
|
161 |
|
162 |
response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
|
163 |
```
|
164 |
-
Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
|
|
|
165 |
|
166 |
## Limitations
|
167 |
The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
|
|
|
33 |
## Key Features
|
34 |
- 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
|
35 |
- Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
|
36 |
+
- Delivers competitive performance on most English benchmarks and HumanEval coding task;
|
37 |
- Natively multilingual delivering best-in-class performance on Indic benchmarks;
|
38 |
- Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
|
39 |
- Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
|
|
|
50 |
|
51 |
| Model Name | Release Date |Release Note | Reference|
|
52 |
|------------|-------------|-------------|-------------|
|
53 |
+
| Krutrim-2-Base | 2024-01-31 | Trained with MN12B architecture | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
|
54 |
+
| Krutrim-2-Instruct | 2024-01-31 | Finetuned and aligned version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
|
55 |
|
56 |
|
57 |
## Data Freshness
|
|
|
87 |
| ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 |0.55 | 0.60 | 0.93 (25-shot) | - | 0.50 |
|
88 |
| ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 |0.79 | 0.82 | - | - | - |
|
89 |
| HumanEval - Pass@10 | 0.00 | 0.23 |0.59 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
|
90 |
+
| IF_Eval (0-shot) - Accuracy | 0.16 | 0.46 |- | 0.56 | 0.92 | - | 0.84 |
|
91 |
|
92 |
### Indic Benchmarks
|
93 |
|
|
|
137 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
138 |
|
139 |
# Add custom chat template
|
|
|
|
|
|
|
|
|
140 |
prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
|
141 |
prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
|
142 |
inputs = tokenizer(prompt, return_tensors='pt')
|
|
|
146 |
outputs = model.generate(
|
147 |
**inputs,
|
148 |
max_length=4096,
|
149 |
+
temperature=0.3,
|
150 |
top_k=50,
|
151 |
top_p=0.9,
|
152 |
repetition_penalty=1.2,
|
|
|
157 |
|
158 |
response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
|
159 |
```
|
160 |
+
Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
|
161 |
+
We recommend using `temperature=0.3` for the best performance
|
162 |
|
163 |
## Limitations
|
164 |
The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
|