File size: 3,451 Bytes
bc8b4e8
 
 
 
 
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
 
 
 
 
 
 
bc8b4e8
4c571ec
39369da
8abaca5
bc8b4e8
8abaca5
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
 
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
bc8b4e8
8abaca5
 
bc8b4e8
8abaca5
 
 
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
 
bc8b4e8
8abaca5
bc8b4e8
8abaca5
 
 
bc8b4e8
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
library_name: transformers
tags: []
---

# Model Card for LargeCodeModelGPTBigCode

## Model Overview

```LargeCodeModelGPTBigCode``` is a model designed for code test generation and analysis. It is based on [GPTBigCode](https://huggingface.co/docs/transformers/main/en/model_doc/gpt_bigcode) and is specifically tailored for handling and generating tests for code. The model has been trained on a small manually labeled dataset of code and can be used for various tasks related to code analysis and testing.

<!-- Provide a quick summary of what the model is/does. -->

### Features:
* Code test generation.
* Python code analysis and generation.
* Uses a pre-trained GPT2 model integrated with Hugging Face.

## How it Works

The model is loaded from an external repository, such as Hugging Face, and is initialized using the ```class LargeCodeModelGPTBigCode```. Several parameters can be specified during initialization to configure the model, such as:

* gpt2_name: The link to the model on Hugging Face 
* prompt_string: An additional wrapper for better understanding of the task by the model 
* params_inference: Inference parameters (used in self.gpt2.generate(**inputs, **inference_params))
* max_length: The maximum number of tokens in the sequence 
* device: The device to run the model on 
* saved_model_path: Path to the fine-tuned model
* num_lines: Number of lines (due to "non-terminating" model generation)
* flag_hugging_face: Flag to enable usage with Hugging Face (default: False)
* flag_pretrained: Flag to initialize the model with pre-trained weights

You should download [inference_gptbigcode.py](inference_gptbigcode.py) for proper model usage or use ```git clone https://huggingface.co/4ervonec19/SimpleTestGenerator``` instead. Also you may use this file for inference parameters tuning.

### Model Initialization

```python
from inference_gptbigcode import LargeCodeModelGPTBigCode

gpt2bigcode = "4ervonec19/SimpleTestGenerator"

CodeModel = LargeCodeModelGPTBigCode(gpt2_name=gpt2bigcode, 
                                    flag_pretrained=True, 
                                    flag_hugging_face=True)

```

### Inference Example

Here’s an example of inference where the model is used to generate tests based on a given code snippet:

```python
code_example = '''def equals_zero(a):
    if a == 0:
      return True
    return False'''

tests_generated = CodeModel.input_inference(code_text=code_example)

# Result
print(tests_generated['generated_output'])
```

### Output:
The result will contain the input function and generated tests dict, for example:

```python
{'input_function': ('def equals_zero(a):\n    if a == 0:\n      return True\n    return False',),
 'generated_output': 'def test_equals_zero():\n    assert equals_zero(0) is True\n    assert equals_zero(1) is False\n    assert equals_zero(0) is True\n    assert equals_zero(1.5) is False'}
```

## Model Details

* Architecture: GPT2
* Pretraining: Yes, the model uses a pre-trained GPT2 version for test generation and code generation.
* Framework: PyTorch/HuggingFace
* License: MIT (or another, depending on the model's license)

## Limitations

* The model may not always generate correct or optimal tests, especially for complex or non-standard code fragments.
* Some understanding of code structure may be required for optimal results.
* The quality of generated tests depends on the quality of the input code and its context.