Spaces:

JetBrains-Research
/

long-code-arena

Running

galtimur commited on Jun 5, 2024

Commit

3ae0643

verified ·

1 Parent(s): 6553f1e

Update src/tasks_content.py

Files changed (1) hide show

src/tasks_content.py CHANGED Viewed

@@ -24,9 +24,13 @@ TASKS_DESCRIPTIONS = {
     "ci_builds_repair": """# CI builds repair\n
-        Our CI builds repair benchmark 🤗 [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair) includes 77 data points.
-        We use the `Pass@1` metric for CI builds repair.
         For further details on the dataset and the baselines from the 🏟️ Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines).
         """,

     "ci_builds_repair": """# CI builds repair\n
+        Our CI Builds Repair benchmark 🤗 [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair) includes 77 data points.
+        We use Pass@1 metric for CI repair.
+        Models can be evaluated in three task types:
+        * `full` – *no* ground truth diffs are used for model evaluation;
+        * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
+        * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
         For further details on the dataset and the baselines from the 🏟️ Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines).
         """,