tenyx
/

TenyxChat-7B-v1

Text Generation

tenyx-fine-tuning

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sarath-shekkizhar commited on Jan 10, 2024

Commit

1edad18

·

verified ·

1 Parent(s): c3c7ee0

Update README.md

Fixing broken link

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -128,7 +128,7 @@ These benchmarks test reasoning and knowledge in various tasks in few-shot setti
 | Mistral-7B | 62.4 | 74.0 | 38.1 | 57.2 | 62.8 | 37.8 | 55.38 |
 | OpenLLM Leader-7B  | 64.3 | 78.7 | 73.3 | 66.6 | 68.4 | 58.5 | 68.3 |
-**Note:** While the Open LLM Leaderboard indicates that these chat models perform less effectively compared to the leading 7B model, it's important to note that the leading model struggles in the multi-turn chat setting of MT-Bench (as demonstrated in our evaluation [above](https://www.notion.so/TenyxChat-Language-Model-Alignment-using-Tenyx-Fine-tuning-30e60a53d17a46b0a4755c74f0f8b222?pvs=21)). In contrast, TenyxChat-7B-v1 demonstrates robustness against common fine-tuning challenges, such as *catastrophic forgetting*. This unique feature enables TenyxChat-7B-v1 to excel not only in chat benchmarks like MT-Bench, but also in a wider range of general reasoning benchmarks on the Open LLM Leaderboard.
 # Limitations

 | Mistral-7B | 62.4 | 74.0 | 38.1 | 57.2 | 62.8 | 37.8 | 55.38 |
 | OpenLLM Leader-7B  | 64.3 | 78.7 | 73.3 | 66.6 | 68.4 | 58.5 | 68.3 |
+**Note:** While the Open LLM Leaderboard indicates that these chat models perform less effectively compared to the leading 7B model, it's important to note that the leading model struggles in the multi-turn chat setting of MT-Bench (as demonstrated in our evaluation [above](#comparison-with-additional-open-llm-leaderboard-models)). In contrast, TenyxChat-7B-v1 demonstrates robustness against common fine-tuning challenges, such as *catastrophic forgetting*. This unique feature enables TenyxChat-7B-v1 to excel not only in chat benchmarks like MT-Bench, but also in a wider range of general reasoning benchmarks on the Open LLM Leaderboard.
 # Limitations