Doubts about the reliability of the IFEval-fr dataset.

#2
by LeMoussel - opened

When looking at the results on this OpenLLMFrenchleaderboard, I am somewhat surprised by the low scores for the IFEval-fr metric. None of the models score above 17 points. This raises. Is the data of poor quality? Could there be a bug in the evaluation code?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment