OpenLLMFrenchLeaderboard

Running

Doubts about the reliability of the IFEval-fr dataset.

by LeMoussel - opened Mar 21

Mar 21

When looking at the results on this OpenLLMFrenchleaderboard, I am somewhat surprised by the low scores for the IFEval-fr metric. None of the models score above 17 points. This raises. Is the data of poor quality? Could there be a bug in the evaluation code?

malhajar

le-leadboard org Jul 27

Thanks for raising this. After a period of discontinuation, i will be re-enhance the benchmarks really soon. Don't hesitate to let me know, if it's something you would like to participate in. I agree that ifeval must be relooked at (might even be a problem from the original sets design)

malhajar changed discussion status to closed Jul 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment