|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
license: mit |
|
datasets: |
|
- ArtifactAI/arxiv_python_research_code |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
A parameter-efficient finetune (using LoRA) of DeepSeek Coder 1.3B finetuned on Python code. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
A finetune of DeepSeek Coder 1.3B finetuned on 1000 examples of Python code from the ArtifactAI/arxiv_python_research_code dataset. |
|
|
|
- **Model type:** Text Generation |
|
- **Language(s) (NLP):** English, Python |
|
- **Finetuned from model:** deepseek-ai/deepseek-coder-1.3b-base |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/kevin-v96/python-codecomplete-lm |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
To generate Python code |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
model_name = "MadMarx37/deepseek-coder-1.3b-python-peft" |
|
|
|
def generate_output(input): |
|
# Run text generation pipeline with our next model |
|
pipe = pipeline(task="text-generation", model=model_name, tokenizer=model_name, max_length=max_length) |
|
result = pipe(input) |
|
print(result[0]['generated_text']) |
|
``` |
|
|
|
## Training Details |
|
|
|
#### Training Hyperparameters |
|
|
|
- Training regime: fp16 mixed-precision with original model loaded in 4bits with bitsandbytes <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
- learning_rate = 2e-3 |
|
- lr_scheduler_type = 'cosine_with_restarts' |
|
- max_grad_norm = 0.001 |
|
- weight_decay = 0.001 |
|
- num_train_epochs = 15 |
|
- eval_strategy = "steps" |
|
- eval_steps = 25 |
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
1.3B parameters. Training time of ~2 hours on an RTX3080. |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
https://huggingface.co/datasets/ArtifactAI/arxiv_python_research_code |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
Standard training and eval loss from the HF SFTTrainer. |
|
|
|
### Results |
|
|
|
Training Loss: 0.074100 |
|
Validation Loss: 0.022271 |
|
|
|
#### Summary |
|
|
|
The training had some unstability in the gradient norms, but the overall trend in both training and validation loss |
|
were downward, and validation loss has almost plateaud, which is ideally where we want our model. The code generation on the same |
|
prompts that we tested the original model on also seem better with the finetuned model. A good way to make the model better, if |
|
we wanted to increase the finetuning data, would be to also increase the epochs. |
|
|
|
The training run metrics can be seen here: |
|
https://wandb.ai/kevinv3796/python-autocomplete-deepseek/reports/Supervised-Finetuning-run-for-DeepSeek-Coder-1-3B-on-Python-Code--Vmlldzo3NzQ4NjY0?accessToken=bo0rlzp0yj9vxf1xe3fybfv6rbgl97w5kkab478t8f5unbwltdczy63ba9o9kwjp |
|
|