File size: 5,483 Bytes
c7125a8
 
 
 
 
 
 
 
 
3c251b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a8cd64
3c251b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c63fb42
1e70a8d
c63fb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a8cd64
c63fb42
7a8cd64
c63fb42
 
 
d3d6a56
c63fb42
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
license: llama2
language:
- en
base_model:
- meta-llama/CodeLlama-13b-hf
pipeline_tag: text-generation
tags:
- code
---

# Model Card for TestGen-Dart v0.2

This model card provides information about **TestGen-Dart v0.2**, a fine-tuned version of Meta's Code Llama 13B model, optimized for generating unit tests in Dart for mobile applications. This model was developed as part of a research project on enhancing transformer-based large language models (LLMs) for specific downstream tasks while ensuring cost efficiency and accessibility on standard consumer hardware.

---

## Model Details

### Model Description

**TestGen-Dart v0.2** is a fine-tuned version of Code Llama 13B, specifically adapted for generating unit test cases for Dart code. This model demonstrates enhanced capabilities in producing syntactically and functionally correct test cases compared to its base model.

- **Developed by:** Jacob Hoffmann, Demian Frister (Karlsruhe Institute of Technology - KIT, AIFB-BIS)
- **Funded by:** Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition
- **Shared by:** Jacob Hoffmann, Demian Frister
- **Model type:** Fine-tuned Code Llama 13B for test generation in Dart
- **Language(s):** English
- **License:** LLaMA 2 Community License
- **Finetuned from model:** Meta's Code Llama 13B

### Model Sources

- **Repository:** [GitHub Repository](https://github.com/example/repo) (placeholder)
- **Paper:** ["Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models"](https://doi.org/10.1145/3644032.3644454) (published in AST '24)
- **Demo:** Coming soon

---

## Uses

### Direct Use

The model can be used in a zero-shot setting to generate unit tests in Dart. Provide the class code as input, and the model outputs structured unit tests using Dart's `test` package.

### Downstream Use

This model is suitable for integration into developer tools, IDE extensions, or continuous integration pipelines to automate test generation for Dart-based applications.

### Out-of-Scope Use

- Do not use this model for tasks unrelated to Dart test generation.  
- Avoid using this model to improve or train other LLMs not based on LLaMA or its derivatives, per the LLaMA 2 Community License.  
- Misuse for malicious purposes, such as generating incorrect or harmful test cases, is prohibited.

---

## Bias, Risks, and Limitations

### Technical Limitations

- The model's performance is optimized for Dart; it may not work as effectively for other programming languages.
- Functional correctness of the generated tests is not guaranteed; validation by developers is recommended.

### Risks

- Potential for generating syntactically correct but semantically invalid tests. Users should review outputs carefully.
- May exhibit biases present in the training data.

---

## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("username/testgen-dart-v0.2")
model = AutoModelForCausalLM.from_pretrained("username/testgen-dart-v0.2")

# Prepare input
input_code = """
class Calculator {
  int add(int a, int b) {
    return a + b;
  }
}
"""

prompt = f"Generate unit tests in Dart for the following class:\n{input_code}"

# Generate tests
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

The fine-tuning dataset consists of **16,252 Dart code-test pairs** extracted from open-source GitHub repositories using Google BigQuery. The data was subjected to quality filtering and deduplication to ensure high relevance and consistency.

### Training Procedure

- **Fine-tuning Approach:** Supervised Fine-Tuning (SFT) with QLoRA for memory efficiency.  
- **Hardware:** Training was conducted on a single NVIDIA A100 GPU.  
- **Optimization:** Flash Attention 2 was utilized for enhanced performance.  
- **Duration:** The training process ran for up to 32 hours.

### Training Hyperparameters

- **Mixed Precision:** FP16  
- **Optimizer:** AdamW  
- **Learning Rate:** 5e-5  
- **Epochs:** 3  

### Environmental Impact

- **Hardware Type:** NVIDIA A100 GPU  
- **Hours Used:** 32 hours  
- **Carbon Emitted:** 13.099 kgCO2eq  

---

## Evaluation

### Testing Data, Factors & Metrics

- **Testing Data:** A subset of **42 Dart files** from the training dataset, evaluated in a zero-shot setting.  
- **Factors:** Syntax correctness, functional correctness.  
- **Metrics:** pass@1, syntax error rate, functional correctness rate.

### Results

- **Syntax Correctness:** +76% improvement compared to the base model.  
- **Functional Correctness:** +16.67% improvement compared to the base model.  

---

## Citation

If you use this model in your research, please cite:

**BibTeX:**
```bibtex
@inproceedings{hoffmann2024testgen,
  title={Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models},
  author={Hoffmann, Jacob and Frister, Demian},
  booktitle={Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)},
  year={2024},
  doi={10.1145/3644032.3644454}
}
```

## Model Card Contact

- **Jacob Hoffmann**: [[email protected]](mailto:[email protected])  
- **Demian Frister**: [[email protected]](mailto:[email protected])