Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Doubts about the reliability of the IFEval-fr dataset.
#2
by
LeMoussel
- opened
When looking at the results on this OpenLLMFrenchleaderboard, I am somewhat surprised by the low scores for the IFEval-fr metric. None of the models score above 17 points. This raises
. Is the data of poor quality? Could there be a bug in the evaluation code?