Adding the Open Portuguese LLM Leaderboard Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard
The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions
README.md
CHANGED
|
@@ -13,9 +13,9 @@ tags:
|
|
| 13 |
- preference
|
| 14 |
- ultrafeedback
|
| 15 |
- moe
|
|
|
|
| 16 |
datasets:
|
| 17 |
- argilla/ultrafeedback-binarized-preferences-cleaned
|
| 18 |
-
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 19 |
pipeline_tag: text-generation
|
| 20 |
model-index:
|
| 21 |
- name: notux-8x7b-v1
|
|
@@ -108,3 +108,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
| 108 |
|Winogrande (5-shot) |81.61|
|
| 109 |
|GSM8k (5-shot) |61.64|
|
| 110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- preference
|
| 14 |
- ultrafeedback
|
| 15 |
- moe
|
| 16 |
+
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 17 |
datasets:
|
| 18 |
- argilla/ultrafeedback-binarized-preferences-cleaned
|
|
|
|
| 19 |
pipeline_tag: text-generation
|
| 20 |
model-index:
|
| 21 |
- name: notux-8x7b-v1
|
|
|
|
| 108 |
|Winogrande (5-shot) |81.61|
|
| 109 |
|GSM8k (5-shot) |61.64|
|
| 110 |
|
| 111 |
+
|
| 112 |
+
# Open Portuguese LLM Leaderboard Evaluation Results
|
| 113 |
+
|
| 114 |
+
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/argilla/notux-8x7b-v1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
|
| 115 |
+
|
| 116 |
+
| Metric | Value |
|
| 117 |
+
|--------------------------|--------|
|
| 118 |
+
|Average |**73.1**|
|
| 119 |
+
|ENEM Challenge (No Images)| 70.96|
|
| 120 |
+
|BLUEX (No Images) | 60.22|
|
| 121 |
+
|OAB Exams | 49.52|
|
| 122 |
+
|Assin2 RTE | 92.66|
|
| 123 |
+
|Assin2 STS | 82.40|
|
| 124 |
+
|FaQuAD NLI | 79.85|
|
| 125 |
+
|HateBR Binary | 77.91|
|
| 126 |
+
|PT Hate Speech Binary | 73.30|
|
| 127 |
+
|tweetSentBR | 71.08|
|
| 128 |
+
|