galtimur commited on
Commit
3514265
Β·
1 Parent(s): 49c07f7

Replaced benchname to LCA in readme and other files.

Browse files
Files changed (3) hide show
  1. src/content.py +13 -13
  2. src/get_results_for_task.py +2 -2
  3. src/tasks_content.py +15 -15
src/content.py CHANGED
@@ -3,19 +3,19 @@ from .formatting import styled_warning
3
  # ================================
4
  # = ABOUT =
5
  # ================================
6
- INTRODUCTION_TITLE = """<h1 align="center">🏟️ BenchName </h1>"""
7
 
8
- INTRODUCTION_TEXT = """🏟️ **BenchName** is a suite of benchmarks for code-related tasks with large contexts, up to a whole code repository.
9
  It currently spans six different tasks and contains six datasets:
10
 
11
- * πŸ€— [Library-based code generation](https://huggingface.co/datasets/icmlbenchname/library-based-code-generation)
12
- * πŸ€— [CI builds repair](https://huggingface.co/datasets/icmlbenchname/ci-builds-repair)
13
- * πŸ€— [Project-level code completion](https://huggingface.co/datasets/icmlbenchname/project-level-code-completion)
14
- * πŸ€— [Commit message generation](https://huggingface.co/datasets/icmlbenchname/commit-message-generation)
15
- * πŸ€— [Bug localization](https://huggingface.co/datasets/icmlbenchname/bug-localization)
16
- * πŸ€— [Module summarization](https://huggingface.co/datasets/icmlbenchname/module-summarization)
17
 
18
- We are excited to invite you to participate in solving our benchmarks! To submit your results, please send the following materials to our πŸ“© email (icmlbenchname@gmail.com):
19
 
20
  * **Results**: Include the summary of your benchmark outcomes.
21
  * **Reproduction Package**: To ensure the integrity and reproducibility of your results, please include the code for context collection (if any), generation of predictions, and evaluating. You can follow [baselines](https://anonymous.4open.science/r/icml-benchname-2025/README.md) as a reference.
@@ -30,23 +30,23 @@ We look forward to reviewing your innovative solutions!
30
  # ================================
31
  LEADERBOARD_TITLE = '<h2 align="center">πŸ…Leaderboard</h2>'
32
 
33
- LEADERBOARD_TEXT = """The raw results from the leaderboard are available in πŸ€— [icmlbenchname/results](https://huggingface.co/datasets/icmlbenchname/results)."""
34
 
35
  # ================================
36
  # = SUBMISSION =
37
  # ================================
38
  SUBMISSION_TITLE = '<h2 align="center">πŸ“© Make A Submission</h2>'
39
 
40
- SUBMISSION_TEXT_INTRO = """Use the form below to submit new results to 🏟️ BenchName. If any problems arise, don't hesitate to contact us by email `TODO` or open a discussion πŸ’›"""
41
 
42
  SUBMISSION_TEXT_TASK = """1. Select a task you want to submit results for."""
43
 
44
  SUBMISSION_TEXT_METADATA = """2. Fill in some metadata about your submission."""
45
 
46
  SUBMISSION_TEXT_FILES = """3. Attach one or more files with your model's predictions.
47
- * If several files are attached, they will be treated as separate runs of the submitted model (e.g., with different seeds), and the metrics will be averaged across runs. For baselines provided by 🏟️ BenchName Team, the results are averaged across 3 runs.
48
  """
49
 
50
- SUBMISSION_TEXT_SUBMIT = """All set! A new PR to πŸ€— [icmlbenchname/results](https://huggingface.co/datasets/icmlbenchname/results) should be opened when you press "Submit" button. 🏟️ BenchName Team will review it shortly, and the results will appear in the leaderboard.
51
 
52
  ⏳ **Note:** It might take some time (up to 40 minutes) for PR to get created, since it involves computing metrics for your submission."""
 
3
  # ================================
4
  # = ABOUT =
5
  # ================================
6
+ INTRODUCTION_TITLE = """<h1 align="center">🏟️ Long Code Arena </h1>"""
7
 
8
+ INTRODUCTION_TEXT = """🏟️ **Long Code Arena** is a suite of benchmarks for code-related tasks with large contexts, up to a whole code repository.
9
  It currently spans six different tasks and contains six datasets:
10
 
11
+ * πŸ€— [Library-based code generation](https://huggingface.co/datasets/JetBrains-Research/lca-library-based-code-generation)
12
+ * πŸ€— [CI builds repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair)
13
+ * πŸ€— [Project-level code completion](https://huggingface.co/datasets/JetBrains-Research/lca-project-level-code-completion)
14
+ * πŸ€— [Commit message generation](https://huggingface.co/datasets/JetBrains-Research/lca-commit-message-generation)
15
+ * πŸ€— [Bug localization](https://huggingface.co/datasets/JetBrains-Research/lca-bug-localization)
16
+ * πŸ€— [Module summarization](https://huggingface.co/datasets/JetBrains-Research/lca-module-summarization)
17
 
18
+ We are excited to invite you to participate in solving our benchmarks! To submit your results, please send the following materials to our πŸ“© email (lca@jetbrains.com):
19
 
20
  * **Results**: Include the summary of your benchmark outcomes.
21
  * **Reproduction Package**: To ensure the integrity and reproducibility of your results, please include the code for context collection (if any), generation of predictions, and evaluating. You can follow [baselines](https://anonymous.4open.science/r/icml-benchname-2025/README.md) as a reference.
 
30
  # ================================
31
  LEADERBOARD_TITLE = '<h2 align="center">πŸ…Leaderboard</h2>'
32
 
33
+ LEADERBOARD_TEXT = """The raw results from the leaderboard are available in πŸ€— [JetBrains-Research/lca-results](https://huggingface.co/datasets/JetBrains-Research/lca-results)."""
34
 
35
  # ================================
36
  # = SUBMISSION =
37
  # ================================
38
  SUBMISSION_TITLE = '<h2 align="center">πŸ“© Make A Submission</h2>'
39
 
40
+ SUBMISSION_TEXT_INTRO = """Use the form below to submit new results to 🏟️ Long Code Arena. If any problems arise, don't hesitate to contact us by email `TODO` or open a discussion πŸ’›"""
41
 
42
  SUBMISSION_TEXT_TASK = """1. Select a task you want to submit results for."""
43
 
44
  SUBMISSION_TEXT_METADATA = """2. Fill in some metadata about your submission."""
45
 
46
  SUBMISSION_TEXT_FILES = """3. Attach one or more files with your model's predictions.
47
+ * If several files are attached, they will be treated as separate runs of the submitted model (e.g., with different seeds), and the metrics will be averaged across runs. For baselines provided by 🏟️ Long Code Arena Team, the results are averaged across 3 runs.
48
  """
49
 
50
+ SUBMISSION_TEXT_SUBMIT = """All set! A new PR to πŸ€— [JetBrains-Research/lca-results](https://huggingface.co/datasets/JetBrains-Research/lca-results) should be opened when you press "Submit" button. 🏟️ Long Code Arena Team will review it shortly, and the results will appear in the leaderboard.
51
 
52
  ⏳ **Note:** It might take some time (up to 40 minutes) for PR to get created, since it involves computing metrics for your submission."""
src/get_results_for_task.py CHANGED
@@ -37,7 +37,7 @@ def _get_results_stub() -> pd.DataFrame:
37
  "ChrF": "X",
38
  "BERTScore": "X",
39
  "BERTScore (Normalized)": "X",
40
- "Submitted By": "BenchName Team",
41
  "Resources": "",
42
  },
43
  {
@@ -49,7 +49,7 @@ def _get_results_stub() -> pd.DataFrame:
49
  "ChrF": "X",
50
  "BERTScore": "X",
51
  "BERTScore (Normalized)": "X",
52
- "Submitted By": "BenchName Team",
53
  "Resources": "",
54
  },
55
  ]
 
37
  "ChrF": "X",
38
  "BERTScore": "X",
39
  "BERTScore (Normalized)": "X",
40
+ "Submitted By": "LCA Team",
41
  "Resources": "",
42
  },
43
  {
 
49
  "ChrF": "X",
50
  "BERTScore": "X",
51
  "BERTScore (Normalized)": "X",
52
+ "Submitted By": "LCA Team",
53
  "Resources": "",
54
  },
55
  ]
src/tasks_content.py CHANGED
@@ -14,7 +14,7 @@ TASKS_PRETTY_REVERSE = {value: key for key, value in TASKS_PRETTY.items()}
14
  TASKS_DESCRIPTIONS = {
15
  "aggregated": """# Aggregated Results\n
16
 
17
- Here, we present the aggregated results across all the tasks in BenchName (except for Project-level code completion, where its specifics required a different selection of models). To get more details about each task, visit the corresponding tab.
18
 
19
  To obtain aggregated results, we first select only one metric from metric suite for each task:
20
  * Library-based code generation: `API Recall`
@@ -25,11 +25,11 @@ TASKS_DESCRIPTIONS = {
25
 
26
  Then, to ensure a fair comparison across tasks with different score ranges, we normalize all scores to a 0-1 scale, where zero corresponds to the worst-performing model, and 1 to the best one. Note that for mean rank, rather than using strict rankings, we implemented a ranking system with a 10% margin to account for models with similar performance.
27
 
28
- We report mean rank (with std) and mean score across the tasks from BenchName, and the scores for each task in the table below.
29
  """,
30
  "library_based_code_generation": """# Library-based code generation\n
31
 
32
- Our Library-based code generation benchmark πŸ€— [icmlbenchname/library-based-code-generation](https://huggingface.co/datasets/icmlbenchname/library-based-code-generation) includes 150 manually curated instructions asking a model to generate Python code using a particular library. Samples come from 62 Python repositories. All the samples in the dataset are based on reference example programs written by authors of the respective libraries.
33
 
34
  For evaluation, we use two metrics:
35
  * `ChrF`: textual similarity between the generated code and the reference program.
@@ -38,14 +38,14 @@ TASKS_DESCRIPTIONS = {
38
  As a context, we pass a prefix of the list of APIs available in the target library.
39
  We select the prefix based on their BM-25 similarity with the provided instruction.
40
 
41
- For further details on the dataset and the baselines from the BenchName team, refer to the `library_based_code_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
42
 
43
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
44
  """,
45
 
46
  "ci_builds_repair": """# CI builds repair\n
47
 
48
- Our CI builds repair benchmark πŸ€— [icmlbenchname/ci-builds-repair](https://huggingface.co/datasets/icmlbenchname/ci-builds-repair)
49
  includes 77 manually curated and assessed data points coming from 32 Python repositories, which are used to make a model fix a failed build.
50
 
51
  The benchmark clones the repo to the local directory, the model fixes the issue according to logs and the local repo state,
@@ -57,14 +57,14 @@ TASKS_DESCRIPTIONS = {
57
  * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
58
  * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
59
 
60
- For further details on the dataset and the baselines from the BenchName team, refer to the `ci-builds-repair` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
61
 
62
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
63
  """,
64
 
65
  "project_code_completion": """# Project-level code completion\n
66
 
67
- Our Project-level code completion benchmark πŸ€— [icmlbenchname/project-level-code-completion](https://huggingface.co/datasets/icmlbenchname/project-level-code-completion) includes four sets of samples:
68
  * `small-context`: 144 data points,
69
  * `medium-context`: 224 data points,
70
  * `large-context`: 270 data points,
@@ -82,14 +82,14 @@ TASKS_DESCRIPTIONS = {
82
  * *non-informative* – short/long lines, import/print lines, or comment lines;
83
  * *random* – lines that don't fit any of the previous categories.
84
 
85
- For further details on the dataset and the baselines from the BenchName team, refer to the `project_level_code_completion` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
86
 
87
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
88
  """,
89
 
90
  "commit_message_generation": """# Commit message generation\n
91
 
92
- Our Commit message generation benchmark πŸ€— [icmlbenchname/commit-message-generation](https://huggingface.co/datasets/icmlbenchname/commit-message-generation) includes 163 manually curated commits with large diffs from 34 Python projects, which the model needs to generate commit messages for.
93
 
94
  We use the following metrics for evaluation:
95
  * [BLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu)
@@ -97,7 +97,7 @@ TASKS_DESCRIPTIONS = {
97
  * [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
98
  * [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
99
 
100
- For further details on the dataset and the baselines from the BenchName team, refer to the `commit_message_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
101
 
102
  **Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
103
 
@@ -107,7 +107,7 @@ TASKS_DESCRIPTIONS = {
107
 
108
  "bug_localization": """# Bug localization\n
109
 
110
- Our Bug localization benchmark πŸ€— [icmlbenchname/bug-localization](https://huggingface.co/datasets/icmlbenchname/bug-localization) includes 150 manually verified bug issue descriptions with information about pull request that fix them for Python, Java, and Kotlin projects.
111
  The model needs to identify the files within the repository that need to be modified to address the reported bug.
112
 
113
  To evaluate baseline performance, we use the following classification metrics:
@@ -119,19 +119,19 @@ TASKS_DESCRIPTIONS = {
119
  * **All incorrect** - percentage of cases where all buggy files were incorrectly identified
120
  * **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
121
 
122
- For further details on the dataset and the baselines from the BenchName team, refer to the `bug_localization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
123
 
124
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
125
  """,
126
 
127
  "module_summarization": """# Module summarization\n
128
- Our Module summarization benchmark πŸ€— [icmlbenchname/module-summarization](https://huggingface.co/datasets/icmlbenchname/module-summarization) includes 216 manually curated text files describing different documentation of open-source permissive Python projects.
129
  The model is required to generate such description, given the relevant context code and the intent behind the documentation.
130
 
131
  We use a novel metric for evaluation:
132
  * `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/module_summarization/README.md).
133
 
134
- For further details on the dataset and the baselines from the BenchName team, refer to the `module_summarization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
135
 
136
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
137
  """,
@@ -145,6 +145,6 @@ def get_submission_text_files_for_task(task_pretty: Optional[str]) -> str:
145
  task_id = TASKS_PRETTY_REVERSE[task_pretty]
146
 
147
  if task_id == "commit_message_generation":
148
- return f"""**{task_pretty} Instructions:**\n\n* Please, attach files in [JSONLines format](https://jsonlines.org/). For an example, check the predictions provided by BenchName Team in πŸ€— [icmlbenchname/results](https://huggingface.co/datasets/icmlbenchname/results/tree/main/commit_message_generation/predictions). Make sure to include `"prediction"` and `"reference"` fields for each example, the rest are optional."""
149
 
150
  return f"**{task_pretty} Instructions:**\n\n* 🚧 There are no instructions for the current task yet."
 
14
  TASKS_DESCRIPTIONS = {
15
  "aggregated": """# Aggregated Results\n
16
 
17
+ Here, we present the aggregated results across all the tasks in Long Code Arena (except for Project-level code completion, where its specifics required a different selection of models). To get more details about each task, visit the corresponding tab.
18
 
19
  To obtain aggregated results, we first select only one metric from metric suite for each task:
20
  * Library-based code generation: `API Recall`
 
25
 
26
  Then, to ensure a fair comparison across tasks with different score ranges, we normalize all scores to a 0-1 scale, where zero corresponds to the worst-performing model, and 1 to the best one. Note that for mean rank, rather than using strict rankings, we implemented a ranking system with a 10% margin to account for models with similar performance.
27
 
28
+ We report mean rank (with std) and mean score across the tasks from Long Code Arena, and the scores for each task in the table below.
29
  """,
30
  "library_based_code_generation": """# Library-based code generation\n
31
 
32
+ Our Library-based code generation benchmark πŸ€— [JetBrains-Research/lca-library-based-code-generation](https://huggingface.co/datasets/JetBrains-Research/lca-library-based-code-generation) includes 150 manually curated instructions asking a model to generate Python code using a particular library. Samples come from 62 Python repositories. All the samples in the dataset are based on reference example programs written by authors of the respective libraries.
33
 
34
  For evaluation, we use two metrics:
35
  * `ChrF`: textual similarity between the generated code and the reference program.
 
38
  As a context, we pass a prefix of the list of APIs available in the target library.
39
  We select the prefix based on their BM-25 similarity with the provided instruction.
40
 
41
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `library_based_code_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
42
 
43
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
44
  """,
45
 
46
  "ci_builds_repair": """# CI builds repair\n
47
 
48
+ Our CI builds repair benchmark πŸ€— [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair)
49
  includes 77 manually curated and assessed data points coming from 32 Python repositories, which are used to make a model fix a failed build.
50
 
51
  The benchmark clones the repo to the local directory, the model fixes the issue according to logs and the local repo state,
 
57
  * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
58
  * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
59
 
60
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
61
 
62
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
63
  """,
64
 
65
  "project_code_completion": """# Project-level code completion\n
66
 
67
+ Our Project-level code completion benchmark πŸ€— [JetBrains-Research/lca-project-level-code-completion](https://huggingface.co/datasets/JetBrains-Research/lca-project-level-code-completion) includes four sets of samples:
68
  * `small-context`: 144 data points,
69
  * `medium-context`: 224 data points,
70
  * `large-context`: 270 data points,
 
82
  * *non-informative* – short/long lines, import/print lines, or comment lines;
83
  * *random* – lines that don't fit any of the previous categories.
84
 
85
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `project_level_code_completion` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
86
 
87
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
88
  """,
89
 
90
  "commit_message_generation": """# Commit message generation\n
91
 
92
+ Our Commit message generation benchmark πŸ€— [JetBrains-Research/lca-commit-message-generation](https://huggingface.co/datasets/JetBrains-Research/lca-commit-message-generation) includes 163 manually curated commits with large diffs from 34 Python projects, which the model needs to generate commit messages for.
93
 
94
  We use the following metrics for evaluation:
95
  * [BLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu)
 
97
  * [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
98
  * [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
99
 
100
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `commit_message_generation` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
101
 
102
  **Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
103
 
 
107
 
108
  "bug_localization": """# Bug localization\n
109
 
110
+ Our Bug localization benchmark πŸ€— [JetBrains-Research/lca-bug-localization](https://huggingface.co/datasets/JetBrains-Research/lca-bug-localization) includes 150 manually verified bug issue descriptions with information about pull request that fix them for Python, Java, and Kotlin projects.
111
  The model needs to identify the files within the repository that need to be modified to address the reported bug.
112
 
113
  To evaluate baseline performance, we use the following classification metrics:
 
119
  * **All incorrect** - percentage of cases where all buggy files were incorrectly identified
120
  * **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
121
 
122
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `bug_localization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
123
 
124
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
125
  """,
126
 
127
  "module_summarization": """# Module summarization\n
128
+ Our Module summarization benchmark πŸ€— [JetBrains-Research/lca-module-summarization](https://huggingface.co/datasets/JetBrains-Research/lca-module-summarization) includes 216 manually curated text files describing different documentation of open-source permissive Python projects.
129
  The model is required to generate such description, given the relevant context code and the intent behind the documentation.
130
 
131
  We use a novel metric for evaluation:
132
  * `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/module_summarization/README.md).
133
 
134
+ For further details on the dataset and the baselines from the Long Code Arena team, refer to the `module_summarization` directory in [our baselines repository](https://anonymous.4open.science/r/icml-benchname-2025/).
135
 
136
  **Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
137
  """,
 
145
  task_id = TASKS_PRETTY_REVERSE[task_pretty]
146
 
147
  if task_id == "commit_message_generation":
148
+ return f"""**{task_pretty} Instructions:**\n\n* Please, attach files in [JSONLines format](https://jsonlines.org/). For an example, check the predictions provided by Long Code Arena Team in πŸ€— [JetBrains-Research/lca-results](https://huggingface.co/datasets/JetBrains-Research/lca-results/tree/main/commit_message_generation/predictions). Make sure to include `"prediction"` and `"reference"` fields for each example, the rest are optional."""
149
 
150
  return f"**{task_pretty} Instructions:**\n\n* 🚧 There are no instructions for the current task yet."