File size: 4,886 Bytes
416c864 07a5a56 95a64c9 07a5a56 416c864 07a5a56 9a23788 07a5a56 9a23788 07a5a56 95a64c9 07a5a56 9a23788 07a5a56 9a23788 07a5a56 9a23788 07a5a56 9a23788 07a5a56 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
---
language:
- zh
- en
tags:
- code
- autocomplete
- pytorch
- en
license: "apache-2.0"
---
# GPT2 for Code AutoComplete Model
code-autocomplete, a code completion plugin for Python.
**code-autocomplete** can Automatic completion of code line granularity and block granularity.
## Usage
Open source repo:[code-autocomplete](https://github.com/shibing624/code-autocomplete),support GPT2 model, usage:
```python
from autocomplete.gpt2 import Infer
m = Infer(model_name="gpt2", model_dir="shibing624/code-autocomplete-gpt2-base", use_cuda=False)
i = m.predict('import torch.nn as')
print(i)
```
Also, use huggingface/transformers:
*Please use 'GPT2' related functions to load this model!*
```python
import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model.to(device)
prompts = [
"""from torch import nn
class LSTM(Module):
def __init__(self, *,
n_tokens: int,
embedding_size: int,
hidden_size: int,
n_layers: int):""",
"""import numpy as np
import torch
import torch.nn as""",
"import java.util.ArrayList",
"def factorial(n):",
]
for prompt in prompts:
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt').to(device)
outputs = model.generate(input_ids=input_ids,
max_length=64 + len(prompt),
temperature=1.0,
top_k=50,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
num_return_sequences=1,
length_penalty=2.0,
early_stopping=True)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
print("=" * 20)
```
output:
```shell
from torch import nn
class LSTM(Module):
def __init__(self, *,
n_tokens: int,
embedding_size: int,
hidden_size: int,
n_layers: int):
self.embedding_size = embedding_size
====================
import numpy as np
import torch
import torch.nn as np
from onmt import nnumpy as np
class PredicterDNN(nn.Module):
@classmethod
@parameterized.expand([0.5, 2.5] + (10, 10))
@classmethod
@static
def add(self, sample_rate, max_iters=self.max_iters, mask_fre
====================
import java.util.ArrayList[Tuple[Int]],
====================
def factorial(n): number of elements per dimension,
assert len(n) > 1
n.append(self.n_iters)
n = n_iter(self.n_norm)
def _score(
====================
Process finished with exit code 0
```
Model files:
```
code-autocomplete-gpt2-base
├── config.json
├── merges.txt
├── pytorch_model.bin
├── special_tokens_map.json
├── tokenizer_config.json
└── vocab.json
```
### Train data
#### pytorch_awesome projects source code
download [code-autocomplete](https://github.com/shibing624/code-autocomplete),
```shell
cd autocomplete
python create_dataset.py
```
If you want train code-autocomplete GPT2 model,refer [https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2.py](https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2.py)
### About GPT2
Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
[this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
and first released at [this page](https://openai.com/blog/better-language-models/).
Disclaimer: The team releasing GPT-2 also wrote a
[model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
## Citation
```latex
@misc{code-autocomplete,
author = {Xu Ming},
title = {code-autocomplete: Code AutoComplete with GPT model},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/shibing624/code-autocomplete},
}
```
|