AIEnergyScore/README · Complete AI energy score with LLM performance/efficiency

Hello AI Energy team, I enjoyed the incredible work being done!!
To strengthen the analysis of this project, is it possible to take into account the effectiveness of an LLM in dealing with questions, by taking its ELO score for example? I'll illustrate with a fairly small model (SML), compared to an LLM, which won't answer my question correctly. Probably an LLM judge will reject its answer and ask to repeat (because the human will ask it again, in another way, reworded, anyway), and will therefore have to go through several iterations to produce the expected result. Whereas a more “powerful” model is likely to answer the question immediately and correctly.
How can we address this issue with the AIEnergyScore to provide decision support for LLM selection? Perhaps add weights or a second row of stars?

Kind regards,
David Aparicio, Senior DevSecOps @ Sopht Lyon