jacobhoffmann commited on
Commit
3c251b0
·
verified ·
1 Parent(s): c7125a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -7,4 +7,89 @@ base_model:
7
  pipeline_tag: text-generation
8
  tags:
9
  - code
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: text-generation
8
  tags:
9
  - code
10
+ ---
11
+
12
+ # Model Card for TestGen-Dart v0.2
13
+
14
+ This model card provides information about **TestGen-Dart v0.2**, a fine-tuned version of Meta's Code Llama 13B model, optimized for generating unit tests in Dart for mobile applications. This model was developed as part of a research project on enhancing transformer-based large language models (LLMs) for specific downstream tasks while ensuring cost efficiency and accessibility on standard consumer hardware.
15
+
16
+ ---
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ **TestGen-Dart v0.2** is a fine-tuned version of Code Llama 13B, specifically adapted for generating unit test cases for Dart code. This model demonstrates enhanced capabilities in producing syntactically and functionally correct test cases compared to its base model.
23
+
24
+ - **Developed by:** Jacob Hoffmann, Demian Frister (Karlsruhe Institute of Technology - KIT, AIFB-BIS)
25
+ - **Funded by:** Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition
26
+ - **Shared by:** Jacob Hoffmann, Demian Frister
27
+ - **Model type:** Fine-tuned Code Llama 13B for test generation in Dart
28
+ - **Language(s):** English
29
+ - **License:** LLaMA 2 Community License
30
+ - **Finetuned from model:** Meta's Code Llama 13B
31
+
32
+ ### Model Sources
33
+
34
+ - **Repository:** [GitHub Repository](https://github.com/example/repo) (placeholder)
35
+ - **Paper:** ["Test Case Generation with Fine-Tuned LLaMA Models"](https://doi.org/10.1145/3644032.3644454) (published in AST '24)
36
+ - **Demo:** Coming soon
37
+
38
+ ---
39
+
40
+ ## Uses
41
+
42
+ ### Direct Use
43
+
44
+ The model can be used in a zero-shot setting to generate unit tests in Dart. Provide the class code as input, and the model outputs structured unit tests using Dart's `test` package.
45
+
46
+ ### Downstream Use
47
+
48
+ This model is suitable for integration into developer tools, IDE extensions, or continuous integration pipelines to automate test generation for Dart-based applications.
49
+
50
+ ### Out-of-Scope Use
51
+
52
+ - Do not use this model for tasks unrelated to Dart test generation.
53
+ - Avoid using this model to improve or train other LLMs not based on LLaMA or its derivatives, per the LLaMA 2 Community License.
54
+ - Misuse for malicious purposes, such as generating incorrect or harmful test cases, is prohibited.
55
+
56
+ ---
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ ### Technical Limitations
61
+
62
+ - The model's performance is optimized for Dart; it may not work as effectively for other programming languages.
63
+ - Functional correctness of the generated tests is not guaranteed; validation by developers is recommended.
64
+
65
+ ### Risks
66
+
67
+ - Potential for generating syntactically correct but semantically invalid tests. Users should review outputs carefully.
68
+ - May exhibit biases present in the training data.
69
+
70
+ ---
71
+
72
+ ## How to Get Started with the Model
73
+
74
+ ```python
75
+ from transformers import AutoModelForCausalLM, AutoTokenizer
76
+
77
+ # Load model and tokenizer
78
+ tokenizer = AutoTokenizer.from_pretrained("username/testgen-dart-v0.2")
79
+ model = AutoModelForCausalLM.from_pretrained("username/testgen-dart-v0.2")
80
+
81
+ # Prepare input
82
+ input_code = """
83
+ class Calculator {
84
+ int add(int a, int b) {
85
+ return a + b;
86
+ }
87
+ }
88
+ """
89
+
90
+ prompt = f"Generate unit tests in Dart for the following class:\n{input_code}"
91
+
92
+ # Generate tests
93
+ inputs = tokenizer(prompt, return_tensors="pt")
94
+ outputs = model.generate(**inputs, max_length=512)
95
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))