Spaces:

wzxii
/

Memorization-or-Generation-of-Big-Code-Models-Leaderboard

Running

App Files Files Community

wzxii commited on Sep 17, 2024

Commit

53d6ca7

·

verified ·

1 Parent(s): 3d2b233

Upload index.html

Files changed (1) hide show

index.html +2 -2

index.html CHANGED Viewed

@@ -140,9 +140,9 @@
                 Similar to the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" target="_blank">🤗 Open LLM Leaderboard</a>,
                 we selected two common benchmarks for evaluating Code LLMs on multiple programming languages:</p> -->
         <ul>
-            <li><a href="https://huggingface.co/datasets/openai_humaneval" target="_blank">HumanEval</a>: Used to measure the functional correctness of programs generated from docstrings. It includes 164 Python programming problems.
             </li>
-            <li><a href="https://github.com/YihongDong/CodeGenEvaluation" target="_blank">HumanEval-ET</a>: The extended version of HumanEval benchmark, where each task includes more than 100 test cases.
             </li>
         </ul>
         <p>

                 Similar to the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" target="_blank">🤗 Open LLM Leaderboard</a>,
                 we selected two common benchmarks for evaluating Code LLMs on multiple programming languages:</p> -->
         <ul>
+            <li><a href="https://huggingface.co/datasets/openai_humaneval" target="_blank">HumanEval</a>:&ensp;Used to measure the functional correctness of programs generated from docstrings. It includes 164 Python programming problems.
             </li>
+            <li><a href="https://github.com/YihongDong/CodeGenEvaluation" target="_blank">HumanEval-ET</a>:&ensp;The extended version of HumanEval benchmark, where each task includes more than 100 test cases.
             </li>
         </ul>
         <p>