|
--- |
|
pretty_name: "ComBack" |
|
language: |
|
- code |
|
tags: |
|
- C++/C Code |
|
- Compiler Backend |
|
license: "cc-by-4.0" |
|
--- |
|
|
|
|
|
# ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency |
|
|
|
ComBack is a large-scale multi-platform compiler backend code dataset. |
|
This repository contains all fine-tuned models and scripts for reproducing experimental results. |
|
|
|
|
|
## Dataset Information |
|
|
|
Details can be found at https://huggingface.co/datasets/docz-ict/ComBack |
|
|
|
## Task Example |
|
|
|
- Statement-Level Completion: complete current statement. |
|
```c++ |
|
//Inputs: |
|
... |
|
adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() |
|
//Ground Truth: |
|
MachineInstr::FrameDestroy); |
|
``` |
|
|
|
- Next-Statement Suggestion: predict the next statement. |
|
|
|
```c++ |
|
//Inputs: |
|
... |
|
maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask; |
|
//Ground Truth: |
|
MFI -> setMaxCallFrameSize(maxCallFrameSize); |
|
``` |
|
|
|
|
|
- Code Generation: generate a function with function description in natrual language. |
|
|
|
```c++ |
|
//Inputs: |
|
getPointerRegClass: Returns a TargetRegisterClass used for pointer values. |
|
Target-Specific Value: Sparc, SP::I64RegsRegClass, SP::IntRegsRegClass. |
|
//Ground Truth: |
|
TargetRegisterClass *SparcRegisterInfo::getPointerRegClass(MachineFunction &MF ,unsigned Kind) { |
|
return Subtarget.is64Bit() ? &SP::I64RegsRegClass : &SP::IntRegsRegClass; |
|
} |
|
``` |
|
|
|
|
|
|
|
## 1. Dependency |
|
|
|
- python version == 3.8.1 |
|
- pip install -r requirements.txt |
|
|
|
## 2. Fine-Tuning |
|
We fine-tuned six pre-trained code language models on 8 Tesla V100 each with 16GB. |
|
You can fine-tune each model on our datasets by running: |
|
|
|
```shell |
|
# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+ |
|
# Task Options: code-generation, code-completion, new-target-completion(Only for CodeT5+), new-target-generation(Only for CodeT5+) |
|
bash ./Script/Model/{Model Type}/{Task}/run_fine_tuning*.sh |
|
``` |
|
|
|
|
|
|
|
## 3. Reproducing Results in Table.2 |
|
|
|
### Dataset split scheme |
|
Split data of all 178 backends into train/valid/test set in the ratio of 80%:10%:10% |
|
|
|
- Dataset Info |
|
|
|
| Task | Train | Valid | Test | |
|
| ---- | ---- | ---- | ---- | |
|
| Statement-Level Comp. | 128,899(11.36M Token) | 16,112(1.43M Token) | 16,113(1.43M Token) | |
|
| Next-Statement Sugg. | 173,052(15.69M Token) | 21,631(1.99M Token) | 21,632(1.98M Token) | |
|
| Code Generation. | 36,236(5.10M Token) | 4,530(0.64M Token) | 4,530(0.64M Token) | |
|
|
|
### Reproducing results in Table.2 by running: |
|
|
|
```shell |
|
# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+ |
|
# Task Options: code-generation, code-completion |
|
bash ./Script/Model/{Model Type}/{Task}/run_test.sh |
|
``` |
|
### Results |
|
|
|
- Without Fine-Tuning |
|
| | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | |
|
|-------------|:-----------------:|:-----------------:|:----------------:|:----------------:|:----------:|:----------:| |
|
| **Model** | EM | ED | EM | ED | BLEU4 | ED | |
|
| CodeBert-c | 0.00 | 0.97 | 0.00 | 1.31 | 0.00 | 0.44 | |
|
| GraphCodeBert-c | 0.00 | 0.35 | 0.00 | 0.54 | 0.00 | 2.41 | |
|
| UnixCoder-base-nine | 0.07 | 27.56 | 15.93 | 29.11 | 0.00 | 31.81 | |
|
| CodeT5-base | 0.65 | 21.45 | 7.23 | 23.50 | 0.00 | 13.57 | |
|
| NatGen | 0.00 | 13.52 | 0.02 | 15.95 | 0.01 | 28.76 | |
|
| CodeT5+-220m | 0.02 | 7.24 | 0.12 | 9.87 | 0.00 | 12.33 | |
|
|
|
- Fine-Tuned |
|
| | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | |
|
|-------------|:-----------------:|:-----------------:|:----------------:|:----------------:|:----------:|:----------:| |
|
| **Model** | EM | ED | EM | ED | BLEU4 | ED | |
|
| CodeBert-c | 53.84 | 77.44 | 52.67 | 70.82 | 23.54 | 54.63 | |
|
| GraphCodeBert-c | 43.00 | 71.89 | 47.10 | 61.31 | 20.73 | 48.83 | |
|
| UnixCoder-base-nine | **67.84** | **85.06** | 58.51 | 75.31 | 56.24 | 73.45 | |
|
| CodeT5-base | 66.38 | 84.34 | 58.52 | 76.03 | 70.87 | 80.45 | |
|
| NatGen | 67.47 | 84.83 | **60.30** | **76.84** | 71.73 | 81.39 | |
|
| CodeT5+-220m | 66.93 | 84.45 | 59.57 | 76.41 | **75.28** | **82.95** | |
|
|
|
|
|
|
|
|
|
## 4. Reproducing Results in Table.3 |
|
|
|
|
|
|
|
|
|
|
|
### Dataset split scheme |
|
|
|
Take data of RISC-V,ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of other CPU, MPU and GPU targets excluding RI5CY(RI5CY is custmoized based on RISCV) |
|
|
|
|
|
- Datset Info |
|
|
|
|
|
| Task | Train | Valid | Test | |
|
| ---- | ---- | ---- | ---- | |
|
| Statement-Level Comp. | 114,016(10.20M Token) | 20,121(1.81M Token) | 6,645(0.58M Token) | |
|
| Next-Statement Sugg. | 152,114(14.10M Token) | 26,844(2.49M Token) | 9,313(0.83M Token) | |
|
| Code Generation. | 30,633(4.44M Token) | 5,406(0.79M Token) | 2,819(0.37M Token) | |
|
|
|
|
|
|
|
### Input examples for ChatGPT-3.5-Turbo and Code-LLaMA-34B-Instruct |
|
**Statement-Level Completion** |
|
```cpp |
|
//Prompt: Complete the last statement of this code snippet: |
|
... |
|
adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() |
|
``` |
|
**Next-Statement Suggestion** |
|
```cpp |
|
//Prompt: Predict the next statement of this code snippet: |
|
... |
|
maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask; |
|
``` |
|
**Code Generation** |
|
```cpp |
|
//Prompt: Create a function named "getPointerRegClass" for "Sparc" backend of LLVM Compiler. |
|
//The description of this function is "Returns a TargetRegisterClass used for pointer values". |
|
//It contains "Sparc", "SP::I64RegsRegClass", "SP::IntRegsRegClass" as target specific values. |
|
``` |
|
|
|
|
|
### Reproducing results in Table.3 by running: |
|
```shell |
|
# Task Options: new-target-completion, new-target-generation |
|
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh |
|
|
|
# ChatGPT |
|
bash ./Script/Exp_Script/ChatGPT/run_chatgpt.sh |
|
|
|
# Code-LLaMA |
|
bash ./Script/Exp_Script/ChatGPT/run_codellama.sh |
|
``` |
|
|
|
|
|
|
|
### Results |
|
|
|
|
|
- GCC |
|
|
|
| | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | |
|
|----------|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| |
|
| | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | |
|
| Model | EM | ED | EM | ED | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED | |
|
| ChatGPT-3.5-Turbo | 10.34 | 38.41 | 15.35 | 42.94 | 12.01 | 41.47 | 6.44 | 12.9 | 9.75 | 20.79 | 7.97 | 17.79 | 1.37 | 24.12 | 1.67 | 28.26 | 1.57 | 26.97 | |
|
| Code-LLaMA-34B | 0.41 | 19.07 | 0.85 | 16.77 | 0.56 | 18.22 | 1.58 | 13.54 | 2.66 | 17.95 | 2.47 | 16.59 | 1.67 | 27.89 | 1.71 | 30.49 | 1.57 | 27.65 | |
|
| CodeT5+-220m | **51.16** | **75.32** | **52.45** | **74.57** | **50.56** | **75.52** | **49.11** | **67.84** | **38.26** | **59.21** | **38.33** | **56.31** | **32.56** | **58.67** | **19.94** | **50.27** | **25.47** | **52.60** | |
|
|
|
|
|
- LLVM |
|
|
|
| | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | |
|
|----------|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| |
|
| | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | RISC-V | RISC-V | ARC | ARC | NVPTX | NVPTX | |
|
| Model | EM | ED | EM | ED | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED | |
|
| ChatGPT-3.5-Turbo | 12.08 | 41.39 | 16.77 | 42.02 | 14.73 | 43.72 | 9.80 | 21.86 | 10.81 | 20.66 | 11.39 | 22.82 | 1.23 | 25.12 | 1.30 | 27.19 | 1.43 | 25.45 | |
|
| Code-LLaMA-34B | 0.45 | 17.61 | 0.61 | 17.21 | 0.99 | 17.23 | 1.75 | 15.04 | 0.42 | 11.27 | 2.42 | 16.25 | 1.43 | 27.24 | 1.61 | 32.12 | 1.59 | 28.08 | |
|
| CodeT5+-220m | **62.68** | **82.02** | **71.34** | **85.98** | **64.45** | **81.53** | **48.71** | **68.95** | **58.68** | **74.57** | **47.81** | **65.5** | **50.34** | **72.98** | **55.38** | **74.41** | **44.33** | **66.36** | |
|
|
|
|
|
|
|
|
|
|
|
## 5. Reproducing Results in Figure.6 |
|
|
|
### Reproducing results in Table.4 by running: |
|
```shell |
|
# Task Options: new-target-completion, new-target-generation |
|
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh |
|
|
|
# Fork-Flow |
|
bash ./Script/Exp_Script/ForkFlow/run_forkflow.sh |
|
``` |
|
|
|
|
|
### Results |
|
|
|
|
|
- GCC |
|
|
|
| | RISCV | RISCV | ARC | ARC | NVPTX | NVPTX | |
|
|-------------- |------- |------- |------- |------ |------- |------- | |
|
| Method | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED | |
|
| ForkFlow Avg | 3.48 | 5.79 | 1.77 | 3.73 | 4.7 | 3.81 | |
|
| ForkFlow Max | 28.77 | 34.8 | 4.94 | 8.85 | 4.7 | 3.81 | |
|
| CodeT5+ | 32.56 | 58.67 | 25.47 | 52.6 | 19.94 | 50.27 | |
|
|
|
|
|
- LLVM |
|
|
|
| | RISCV | RISCV | ARC | ARC | NVPTX | NVPTX | |
|
|-------------- |------- |------- |------- |------- |------- |------- | |
|
| Method | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED | |
|
| ForkFlow Avg | 12.45 | 22.18 | 19.98 | 33.43 | 15.06 | 28.73 | |
|
| ForkFlow Max | 27.32 | 46.47 | 41.8 | 60.62 | 18.81 | 39.04 | |
|
| CodeT5+ | 50.34 | 72.98 | 55.38 | 74.41 | 44.33 | 66.36 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 6. Reproducing Results in Table.4 |
|
|
|
|
|
|
|
### Dataset split scheme |
|
|
|
|
|
Take data of ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and RI5CY |
|
|
|
- Datset Info |
|
|
|
|
|
| Task | Train | Valid | Test | |
|
| ---- | ---- | ---- | ---- | |
|
| Statement-Level Comp. | 87,018(7.78M Token) | 15,357(1.37M Token) | 2,764(0.26M Token) | |
|
| Next-Statement Sugg. | 113,684(10.65M Token) | 20,063(1.87M Token) | 4,029(0.38M Token) | |
|
| Code Generation. | 21,184(3.14M Token) | 3,739(0.55M Token) | 1,372(0.18M Token) | |
|
|
|
|
|
### Reproducing results in Table.4 by running: |
|
```shell |
|
# Task Options: new-target-completion, new-target-generation |
|
bash ./Script/Model/CodeT5+/{Task}/run_test_new_type.sh |
|
``` |
|
|
|
|
|
|
|
### Results |
|
|
|
- GCC |
|
|
|
| | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | |
|
|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------: |:----------: |:----------: |:----------: | |
|
| | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | |
|
| Dataset | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED | |
|
| -w GPU and MPU | 52.45 | 74.57 | 50.56 | 75.52 | 38.26 | 59.21 | 38.33 | 56.31 | 19.94 | 50.27 | 25.47 | 52.6 | |
|
| -w/o GPU and MPU | 50.53| 74.09 | 46.37 | 72.45 | 37.22 | 58.21 | 38.33 | 56.83 | 19.29 | 49.12 | 22.46 | 50.33 | |
|
| **Decrease** | **1.92** | **0.48** | **4.19** | **3.07** | **1.04** | **1.00** | **0.00** | **-0.52** | **0.65** | **1.15** | **3.01** | **3.37** | |
|
|
|
- LLVM |
|
| | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | |
|
|------------------ |:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------: |:----------: |:----------: |:----------: | |
|
| | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | |
|
| Dataset | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED | |
|
| -w GPU and MPU | 71.34 | 85.98 | 64.45 | 81.53 | 58.68 | 74.57 | 47.81 | 65.50 | 55.38 | 74.41 | 44.33 | 66.36 | |
|
| -w/o GPU and MPU | 69.82 | 85.59 | 60.04 | 79.85 | 58.26 | 73.75 | 46.28 | 63.92 | 49.62 | 70.26 | 42.94 | 65.43 | |
|
| **Decrease** | **1.52** | **0.39** | **4.41** | **1.68** | **0.42** | **0.82** | **1.53** | **1.58** | **5.76** | **4.15** | **1.39** | **0.93** | |
|
|
|
|
|
|
|
## 7. Reproducing Results in Table.5 |
|
|
|
|
|
|
|
### Dataset split scheme |
|
|
|
|
|
Take data of RI5CY in LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and including RISC-V |
|
|
|
- Datset Info |
|
- Excluding RISC-V |
|
|
|
| Task | Train | Valid | Test | |
|
| ---- | ---- | ---- | ---- | |
|
| Statement-Level Comp. | 87,018(7.78M Token) | 15,357(1.37M Token) | 721(0.04M Token) | |
|
| Next-Statement Sugg. | 113,684(10.65M Token) | 20,063(1.87M Token) | 1,035(0.06M Token) | |
|
| Code Generation. | 21,184(3.14M Token) | 3,739(0.55M Token) | 219(0.02M Token) | |
|
|
|
- Including RISC-V |
|
|
|
| Task | Train | Valid | Test | |
|
| ---- | ---- | ---- | ---- | |
|
| Statement-Level Comp. | 90,316(8.06M Token) | 15,940(1.42M Token) | 721(0.04M Token) | |
|
| Next-Statement Sugg. | 118,175(11.04M Token) | 20,856(1.94M Token) | 1,035(0.06M Token) | |
|
| Code Generation. | 22,413(3.30M Token) | 3,957(0.58M Token) | 219(0.02M Token) | |
|
|
|
|
|
|
|
### Reproducing results in Table.5 by running: |
|
```shell |
|
# Task Options: new-target-completion, new-target-generation |
|
bash ./Script/Model/CodeT5+/{Task}/run_test_itr_exp.sh |
|
``` |
|
|
|
|
|
|
|
### Results |
|
|
|
| | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | |
|
|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------: |:----------: | |
|
| Dataset | EM | ED | EM | ED | BLEU4 | ED | |
|
| -w/o RISC-V | 66.16 | 83.79 | 57.29 | 74.73 | 54.41 | 75.41 | |
|
| -w RISC-V | 74.06 | 87.91 | 67.25 | 81.28 | 79.46 | 89.92 | |
|
| **Diff** | **7.90** | **4.12** | **9.96** | **6.55** | **25.05** | **14.51** | |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
``` |
|
@inproceedings{zhong2024comback, |
|
title={ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency}, |
|
author={Ming Zhong, Fang Lyu, Lulin Wang, Hongna Geng, Lei Qiu, Huimin Cui, Xiaobing Feng}, |
|
booktitle={Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, |
|
year={2024} |
|
} |
|
``` |