Spaces:

JetBrains-Research
/

long-code-arena

Running

galtimur commited on Jun 5, 2024

Commit

84dc4f2

verified ·

1 Parent(s): 8314e15

Update src/tasks_content.py

Files changed (1) hide show

src/tasks_content.py CHANGED Viewed

@@ -29,7 +29,8 @@ TASKS_DESCRIPTIONS = {
         The benchmark clones the repo to the local directory, the model fixes the issue according to logs and the local repo state,
         and then the benchmark pushes the repo to GitHub and requests the result of the GitHub CI.
-        We use the `Pass@1` rate metric to measure CI repair, indicating the ratio of data points, for which the build passed successfully after the generated fix.
         Models can be evaluated in three settings:
         * `full` – **no** ground truth diffs are used for model evaluation;
         * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;

         The benchmark clones the repo to the local directory, the model fixes the issue according to logs and the local repo state,
         and then the benchmark pushes the repo to GitHub and requests the result of the GitHub CI.
+        We use the `Pass@1` rate metric to measure CI repair, indicating the ratio of data points, for which the build passed successfully after the generated fix.
         Models can be evaluated in three settings:
         * `full` – **no** ground truth diffs are used for model evaluation;
         * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;