Spaces:
Running
Running
Update content.py
Browse files- content.py +1 -0
content.py
CHANGED
@@ -19,6 +19,7 @@ Here, you can compare models on tasks in the Czech language or submit your own m
|
|
19 |
- On the submission page, __you can view your model's results on the leaderboard without publishing them__.
|
20 |
- The first step is "pre-submission." After this is complete (significance tests may take up to 2 hours), you can choose to submit the results if you wish.
|
21 |
- NEWS:
|
|
|
22 |
- 23.12.2024: We released [a preprint](http://arxiv.org/abs/2412.17933) detailing our work.
|
23 |
- 7.11.2024: We acknowledge that one of the Qwen2.5 models correctly predicted our (& Bigbench's) canary string. This confirms the contamination, it was trained on benchmark data. Other [studies](https://arxiv.org/pdf/2409.01790) also suggest the contamination issues of the Qwen family.
|
24 |
- 1.10.2024: Find out more about π¨πΏ BenCzechMark in our [Huggingface blogpost](https://huggingface.co/blog/benczechmark)!
|
|
|
19 |
- On the submission page, __you can view your model's results on the leaderboard without publishing them__.
|
20 |
- The first step is "pre-submission." After this is complete (significance tests may take up to 2 hours), you can choose to submit the results if you wish.
|
21 |
- NEWS:
|
22 |
+
- 19.02.2025: We added an performance-size plot under the Table for better overview! Scroll down to find out, which model works the best for it's size!
|
23 |
- 23.12.2024: We released [a preprint](http://arxiv.org/abs/2412.17933) detailing our work.
|
24 |
- 7.11.2024: We acknowledge that one of the Qwen2.5 models correctly predicted our (& Bigbench's) canary string. This confirms the contamination, it was trained on benchmark data. Other [studies](https://arxiv.org/pdf/2409.01790) also suggest the contamination issues of the Qwen family.
|
25 |
- 1.10.2024: Find out more about π¨πΏ BenCzechMark in our [Huggingface blogpost](https://huggingface.co/blog/benczechmark)!
|