|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/CodeLlama-13b-hf |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
- gguf |
|
- llama.cpp |
|
- llmstudio |
|
--- |
|
# Model Card for TestGen-Dart v0.2 (GGUF Version) |
|
|
|
This model card provides information about **TestGen-Dart v0.2 (GGUF Version)**, a fine-tuned version of Meta's Code Llama 13B model, optimized for generating unit tests in Dart for mobile applications. This GGUF-quantized model is designed to run efficiently with frameworks like **LLMStudio** and **llama.cpp**, enabling deployment on resource-constrained hardware while maintaining robust performance. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
**TestGen-Dart v0.2** is a fine-tuned version of Code Llama 13B, specifically adapted for generating unit test cases for Dart code. The GGUF quantization enables its use on lightweight, consumer-grade systems without significant performance loss. |
|
|
|
- **Developed by:** Jacob Hoffmann, Demian Frister (Karlsruhe Institute of Technology - KIT, AIFB-BIS) |
|
- **Funded by:** Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition |
|
- **Shared by:** Jacob Hoffmann, Demian Frister |
|
- **Model type:** Fine-tuned Code Llama 13B for test generation in Dart |
|
- **Language(s):** English |
|
- **License:** LLaMA 2 Community License |
|
- **Finetuned from model:** Meta's Code Llama 13B |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [GitHub Repository](https://github.com/example/repo) (placeholder) |
|
- **Paper:** ["Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models"](https://doi.org/10.1145/3644032.3644454) (published in AST '24) |
|
- **Demo:** Coming soon |
|
|
|
--- |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model can be used in a zero-shot setting with **llama.cpp** or **LLMStudio** to generate unit tests in Dart. Provide the class code as input, and the model outputs structured unit tests using Dart's `test` package. |
|
|
|
### Downstream Use |
|
|
|
This model is suitable for integration into developer tools, IDE extensions, or continuous integration pipelines to automate test generation for Dart-based applications. |
|
|
|
### Out-of-Scope Use |
|
|
|
- Do not use this model for tasks unrelated to Dart test generation. |
|
- Avoid using this model to improve or train other LLMs not based on LLaMA or its derivatives, per the LLaMA 2 Community License. |
|
- Misuse for malicious purposes, such as generating incorrect or harmful test cases, is prohibited. |
|
|
|
--- |
|
|
|
## Running the GGUF Model with llama.cpp |
|
|
|
To use this GGUF quantized model with llama.cpp: |
|
|
|
1. Clone the llama.cpp repository and build the binaries: |
|
```bash |
|
git clone https://github.com/ggerganov/llama.cpp |
|
cd llama.cpp |
|
make |
|
``` |
|
2. Place the GGUF file in the models directory: |
|
```bash |
|
mkdir -p models/testgen-dart-v0.2 |
|
mv /path/to/CodeLlama-13B-TestGen-Dart_v0.2.gguf models/testgen-dart-v0.2/ |
|
``` |
|
3. Run the model: |
|
```bash |
|
./main -m models/testgen-dart-v0.2/CodeLlama-13B-TestGen-Dart_v0.2.gguf --prompt "Generate unit tests in Dart for the following class:\nclass Calculator { int add(int a, int b) { return a + b; } }" |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The fine-tuning dataset consists of **16,252 Dart code-test pairs** extracted from open-source GitHub repositories using Google BigQuery. The data was subjected to quality filtering and deduplication to ensure high relevance and consistency. |
|
|
|
### Training Procedure |
|
|
|
- **Fine-tuning Approach:** Supervised Fine-Tuning (SFT) with QLoRA for memory efficiency. |
|
- **Hardware:** Training was conducted on a single NVIDIA A100 GPU. |
|
- **Optimization:** Flash Attention 2 was utilized for enhanced performance. |
|
- **Duration:** The training process ran for up to 32 hours. |
|
|
|
### Training Hyperparameters |
|
|
|
- **Mixed Precision:** FP16 |
|
- **Optimizer:** AdamW |
|
- **Learning Rate:** 5e-5 |
|
- **Epochs:** 3 |
|
|
|
### Environmental Impact |
|
|
|
- **Hardware Type:** NVIDIA A100 GPU |
|
- **Hours Used:** 32 hours |
|
- **Carbon Emitted:** 13.099 kgCO2eq |
|
|
|
--- |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
- **Testing Data:** A subset of **42 Dart files** from the training dataset, evaluated in a zero-shot setting. |
|
- **Factors:** Syntax correctness, functional correctness. |
|
- **Metrics:** pass@1, syntax error rate, functional correctness rate. |
|
|
|
### Results |
|
|
|
- **Syntax Correctness:** +76% improvement compared to the base model. |
|
- **Functional Correctness:** +16.67% improvement compared to the base model. |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@inproceedings{hoffmann2024testgen, |
|
title={Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models}, |
|
author={Hoffmann, Jacob and Frister, Demian}, |
|
booktitle={Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)}, |
|
year={2024}, |
|
doi={10.1145/3644032.3644454} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
- **Jacob Hoffmann**: [[email protected]](mailto:[email protected]) |
|
- **Demian Frister**: [[email protected]](mailto:[email protected]) |