Areyde commited on
Commit
5557fe1
Β·
verified Β·
1 Parent(s): bf01543

Update src/tasks_content.py

Browse files
Files changed (1) hide show
  1. src/tasks_content.py +7 -7
src/tasks_content.py CHANGED
@@ -38,7 +38,7 @@ TASKS_DESCRIPTIONS = {
38
  As a context, we pass a prefix of the list of APIs available in the target library.
39
  We select the prefix based on their BM-25 similarity with the provided instruction.
40
 
41
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `library_based_code_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
42
 
43
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
44
  """,
@@ -57,7 +57,7 @@ TASKS_DESCRIPTIONS = {
57
  * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
58
  * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
59
 
60
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
61
 
62
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
63
  """,
@@ -82,7 +82,7 @@ TASKS_DESCRIPTIONS = {
82
  * *non-informative* – short/long lines, import/print lines, or comment lines;
83
  * *random* – lines that don't fit any of the previous categories.
84
 
85
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `project_level_code_completion` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
86
 
87
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
88
  """,
@@ -97,7 +97,7 @@ TASKS_DESCRIPTIONS = {
97
  * [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
98
  * [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
99
 
100
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `commit_message_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
101
 
102
  **Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
103
 
@@ -119,7 +119,7 @@ TASKS_DESCRIPTIONS = {
119
  * **All incorrect** - percentage of cases where all buggy files were incorrectly identified
120
  * **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
121
 
122
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `bug_localization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
123
 
124
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
125
  """,
@@ -129,9 +129,9 @@ TASKS_DESCRIPTIONS = {
129
  The model is required to generate such description, given the relevant context code and the intent behind the documentation.
130
 
131
  We use a novel metric for evaluation:
132
- * `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/module_summarization/README.md).
133
 
134
- For further details on the dataset and the baselines from the Long Code Arena team, refer to the `module_summarization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
135
 
136
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
137
  """,
 
38
  As a context, we pass a prefix of the list of APIs available in the target library.
39
  We select the prefix based on their BM-25 similarity with the provided instruction.
40
 
41
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `library_based_code_generation` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
42
 
43
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
44
  """,
 
57
  * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
58
  * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
59
 
60
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
61
 
62
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
63
  """,
 
82
  * *non-informative* – short/long lines, import/print lines, or comment lines;
83
  * *random* – lines that don't fit any of the previous categories.
84
 
85
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `project_level_code_completion` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
86
 
87
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
88
  """,
 
97
  * [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
98
  * [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
99
 
100
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `commit_message_generation` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
101
 
102
  **Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
103
 
 
119
  * **All incorrect** - percentage of cases where all buggy files were incorrectly identified
120
  * **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
121
 
122
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `bug_localization` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
123
 
124
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
125
  """,
 
129
  The model is required to generate such description, given the relevant context code and the intent behind the documentation.
130
 
131
  We use a novel metric for evaluation:
132
+ * `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/tree/main/module_summarization).
133
 
134
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `module_summarization` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
135
 
136
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
137
  """,