krutrim-ai-labs
/

Krutrim-2-instruct

PyTorch

mistral

Krutrim

language-model

Model card Files Files and versions Community

krutrim-admin commited on Feb 3

Commit

0d1d15a

verified ·

1 Parent(s): 643156c

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -10

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
 ## Key Features
 - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
 - Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
-- Retains the original performance of MN-12B on most En benchmarks with x3.5 improvement on HumanEval coding task;
 - Natively multilingual delivering best-in-class performance on Indic benchmarks;
 - Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
 - Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
 | Model Name | Release Date |Release Note | Reference|
 |------------|-------------|-------------|-------------|
-| Krutrim-2-Base   | 2024-01-31  | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
-| Krutrim-2-Instruct  | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
 ## Data Freshness
@@ -87,7 +87,7 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
 | ARC_Challenge (0-shot) - Accuracy         | 0.48         | 0.59           |0.55            | 0.60               | 0.93 (25-shot)       | -                      | 0.50                  |
 | ARC_Easy (0-shot) - Accuracy              | 0.73         | 0.80           |0.79            | 0.82               | -                    | -                      | -                     |
 | HumanEval - Pass@10                       | 0.00         | 0.23           |0.59            | 0.80               | 0.88                 | 0.74 (0-shot)          | 0.90                  |
-| IF_Eval (0-shot) - Accuracy               | 0.16         | -              |-               | 0.56               | 0.92                 | -                      | 0.84                  |
 ### Indic Benchmarks
@@ -137,10 +137,6 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 # Add custom chat template
-tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""
-print(tokenizer.get_chat_template())
 prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
 prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
 inputs = tokenizer(prompt, return_tensors='pt')
@@ -150,7 +146,7 @@ inputs.pop("token_type_ids", None)
 outputs = model.generate(
     **inputs,
     max_length=4096,
-    temperature=0.5,
     top_k=50,
     top_p=0.9,
     repetition_penalty=1.2,
@@ -161,7 +157,8 @@ outputs = model.generate(
 response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
 ```
-Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
 ## Limitations
 The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:

 ## Key Features
 - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
 - Supports context up to 128K tokens making it suitable for long multi-turn conversations, long-form generations, document translations and others;
+- Delivers competitive performance on most English benchmarks and HumanEval coding task;
 - Natively multilingual delivering best-in-class performance on Indic benchmarks;
 - Matches or exceeds performance of models much larger (x6) on multilingual Indic generation tasks including creative writing, summarization, and translation;
 - Stronger Indian cultural context relevance - scored the highest in manual evaluation with multiple models in an anonymised setting;
 | Model Name | Release Date |Release Note | Reference|
 |------------|-------------|-------------|-------------|
+| Krutrim-2-Base   | 2024-01-31  | Trained with MN12B architecture | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
+| Krutrim-2-Instruct  | 2024-01-31 | Finetuned and aligned version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
 ## Data Freshness
 | ARC_Challenge (0-shot) - Accuracy         | 0.48         | 0.59           |0.55            | 0.60               | 0.93 (25-shot)       | -                      | 0.50                  |
 | ARC_Easy (0-shot) - Accuracy              | 0.73         | 0.80           |0.79            | 0.82               | -                    | -                      | -                     |
 | HumanEval - Pass@10                       | 0.00         | 0.23           |0.59            | 0.80               | 0.88                 | 0.74 (0-shot)          | 0.90                  |
+| IF_Eval (0-shot) - Accuracy               | 0.16         | 0.46           |-               | 0.56               | 0.92                 | -                      | 0.84                  |
 ### Indic Benchmarks
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 # Add custom chat template
 prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
 prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
 inputs = tokenizer(prompt, return_tensors='pt')
 outputs = model.generate(
     **inputs,
     max_length=4096,
+    temperature=0.3,
     top_k=50,
     top_p=0.9,
     repetition_penalty=1.2,
 response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
 ```
+Note: The provided chat template, which is the default chat template, helps generate the best response by structuring conversations optimally for the model.
+We recommend using `temperature=0.3` for the best performance
 ## Limitations
 The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may: