jacobhoffmann's picture
Update README.md
42f2eeb verified
---
license: llama2
language:
- en
base_model:
- meta-llama/CodeLlama-13b-hf
pipeline_tag: text-generation
tags:
- code
- gguf
- llama.cpp
- llmstudio
---
# Model Card for TestGen-Dart v0.2 (GGUF Version)
This model card provides information about **TestGen-Dart v0.2 (GGUF Version)**, a fine-tuned version of Meta's Code Llama 13B model, optimized for generating unit tests in Dart for mobile applications. This GGUF-quantized model is designed to run efficiently with frameworks like **LLMStudio** and **llama.cpp**, enabling deployment on resource-constrained hardware while maintaining robust performance.
---
## Model Details
### Model Description
**TestGen-Dart v0.2** is a fine-tuned version of Code Llama 13B, specifically adapted for generating unit test cases for Dart code. The GGUF quantization enables its use on lightweight, consumer-grade systems without significant performance loss.
- **Developed by:** Jacob Hoffmann, Demian Frister (Karlsruhe Institute of Technology - KIT, AIFB-BIS)
- **Funded by:** Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition
- **Shared by:** Jacob Hoffmann, Demian Frister
- **Model type:** Fine-tuned Code Llama 13B for test generation in Dart
- **Language(s):** English
- **License:** LLaMA 2 Community License
- **Finetuned from model:** Meta's Code Llama 13B
### Model Sources
- **Repository:** [GitHub Repository](https://github.com/example/repo) (placeholder)
- **Paper:** ["Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models"](https://doi.org/10.1145/3644032.3644454) (published in AST '24)
- **Demo:** Coming soon
---
## Uses
### Direct Use
The model can be used in a zero-shot setting with **llama.cpp** or **LLMStudio** to generate unit tests in Dart. Provide the class code as input, and the model outputs structured unit tests using Dart's `test` package.
### Downstream Use
This model is suitable for integration into developer tools, IDE extensions, or continuous integration pipelines to automate test generation for Dart-based applications.
### Out-of-Scope Use
- Do not use this model for tasks unrelated to Dart test generation.
- Avoid using this model to improve or train other LLMs not based on LLaMA or its derivatives, per the LLaMA 2 Community License.
- Misuse for malicious purposes, such as generating incorrect or harmful test cases, is prohibited.
---
## Running the GGUF Model with llama.cpp
To use this GGUF quantized model with llama.cpp:
1. Clone the llama.cpp repository and build the binaries:
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
```
2. Place the GGUF file in the models directory:
```bash
mkdir -p models/testgen-dart-v0.2
mv /path/to/CodeLlama-13B-TestGen-Dart_v0.2.gguf models/testgen-dart-v0.2/
```
3. Run the model:
```bash
./main -m models/testgen-dart-v0.2/CodeLlama-13B-TestGen-Dart_v0.2.gguf --prompt "Generate unit tests in Dart for the following class:\nclass Calculator { int add(int a, int b) { return a + b; } }"
```
## Training Details
### Training Data
The fine-tuning dataset consists of **16,252 Dart code-test pairs** extracted from open-source GitHub repositories using Google BigQuery. The data was subjected to quality filtering and deduplication to ensure high relevance and consistency.
### Training Procedure
- **Fine-tuning Approach:** Supervised Fine-Tuning (SFT) with QLoRA for memory efficiency.
- **Hardware:** Training was conducted on a single NVIDIA A100 GPU.
- **Optimization:** Flash Attention 2 was utilized for enhanced performance.
- **Duration:** The training process ran for up to 32 hours.
### Training Hyperparameters
- **Mixed Precision:** FP16
- **Optimizer:** AdamW
- **Learning Rate:** 5e-5
- **Epochs:** 3
### Environmental Impact
- **Hardware Type:** NVIDIA A100 GPU
- **Hours Used:** 32 hours
- **Carbon Emitted:** 13.099 kgCO2eq
---
## Evaluation
### Testing Data, Factors & Metrics
- **Testing Data:** A subset of **42 Dart files** from the training dataset, evaluated in a zero-shot setting.
- **Factors:** Syntax correctness, functional correctness.
- **Metrics:** pass@1, syntax error rate, functional correctness rate.
### Results
- **Syntax Correctness:** +76% improvement compared to the base model.
- **Functional Correctness:** +16.67% improvement compared to the base model.
---
## Citation
If you use this model in your research, please cite:
**BibTeX:**
```bibtex
@inproceedings{hoffmann2024testgen,
title={Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models},
author={Hoffmann, Jacob and Frister, Demian},
booktitle={Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)},
year={2024},
doi={10.1145/3644032.3644454}
}
```
## Model Card Contact
- **Jacob Hoffmann**: [[email protected]](mailto:[email protected])
- **Demian Frister**: [[email protected]](mailto:[email protected])