PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
0d1d15a
·
verified ·
1 Parent(s): 643156c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -33,7 +33,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
33
  ## Key Features
34
  - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
35
  - Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
36
- - Retains the original performance of MN-12B on most En benchmarks with x3.5 improvement on HumanEval coding task;
37
  - Natively multilingual delivering best-in-class performance on Indic benchmarks;
38
  - Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
39
  - Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
- | Krutrim-2-Base | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
54
- | Krutrim-2-Instruct | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
55
 
56
 
57
  ## Data Freshness
@@ -87,7 +87,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
87
  | ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 |0.55 | 0.60 | 0.93 (25-shot) | - | 0.50 |
88
  | ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 |0.79 | 0.82 | - | - | - |
89
  | HumanEval - Pass@10 | 0.00 | 0.23 |0.59 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
90
- | IF_Eval (0-shot) - Accuracy | 0.16 | - |- | 0.56 | 0.92 | - | 0.84 |
91
 
92
  ### Indic Benchmarks
93
 
@@ -137,10 +137,6 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
137
  tokenizer = AutoTokenizer.from_pretrained(model_id)
138
 
139
  # Add custom chat template
140
- tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""
141
-
142
- print(tokenizer.get_chat_template())
143
-
144
  prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
145
  prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
146
  inputs = tokenizer(prompt, return_tensors='pt')
@@ -150,7 +146,7 @@ inputs.pop("token_type_ids", None)
150
  outputs = model.generate(
151
  **inputs,
152
  max_length=4096,
153
- temperature=0.5,
154
  top_k=50,
155
  top_p=0.9,
156
  repetition_penalty=1.2,
@@ -161,7 +157,8 @@ outputs = model.generate(
161
 
162
  response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
163
  ```
164
- Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
 
165
 
166
  ## Limitations
167
  The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
 
33
  ## Key Features
34
  - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
35
  - Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
36
+ - Delivers competitive performance on most English benchmarks and HumanEval coding task;
37
  - Natively multilingual delivering best-in-class performance on Indic benchmarks;
38
  - Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
39
  - Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
 
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
+ | Krutrim-2-Base | 2024-01-31 | Trained with MN12B architecture | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
54
+ | Krutrim-2-Instruct | 2024-01-31 | Finetuned and aligned version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
55
 
56
 
57
  ## Data Freshness
 
87
  | ARC_Challenge (0-shot) - Accuracy | 0.48 | 0.59 |0.55 | 0.60 | 0.93 (25-shot) | - | 0.50 |
88
  | ARC_Easy (0-shot) - Accuracy | 0.73 | 0.80 |0.79 | 0.82 | - | - | - |
89
  | HumanEval - Pass@10 | 0.00 | 0.23 |0.59 | 0.80 | 0.88 | 0.74 (0-shot) | 0.90 |
90
+ | IF_Eval (0-shot) - Accuracy | 0.16 | 0.46 |- | 0.56 | 0.92 | - | 0.84 |
91
 
92
  ### Indic Benchmarks
93
 
 
137
  tokenizer = AutoTokenizer.from_pretrained(model_id)
138
 
139
  # Add custom chat template
 
 
 
 
140
  prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
141
  prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
142
  inputs = tokenizer(prompt, return_tensors='pt')
 
146
  outputs = model.generate(
147
  **inputs,
148
  max_length=4096,
149
+ temperature=0.3,
150
  top_k=50,
151
  top_p=0.9,
152
  repetition_penalty=1.2,
 
157
 
158
  response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
159
  ```
160
+ Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
161
+ We recommend using `temperature=0.3` for the best performance
162
 
163
  ## Limitations
164
  The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may: