README.md · aigcode/AIGCodeGeek-DS-6.7B at cbafc7ac9cbc897f7c5e603e16b8ff22c95a3324

metadata

license: apache-2.0
datasets:
  - Leon-Leee/wizardlm_evol_instruct_v2_196K_backuped
  - m-a-p/Code-Feedback
  - openbmb/UltraInteract_sft
  - ise-uiuc/Magicoder-Evol-Instruct-110K
language:
  - en
metrics:
  - code_eval
library_name: transformers
tags:
  - code

AIGCodeGeek-DS-6.7B

Introduction

AIGCodeGeek-DS-6.7B is the first version of our Code-LLM family with competitive performance on benchmarks such as HumanEval(+) and MBPP(+).

It gains a lot of insights from the open-source community and we deeply appreciate all of these great works.

We are preparing for the tech report, so stay tuned for more details:)

Model Details

Model Description

Developed by: Leon Li
License: DeepSeek
Fine-tuned from deepseek-ai/deepseek-coder-6.7b-base with full parameters

Training data

A mixture of both

samples from several high-quality open-source datasets (read Acknowledgements),
our private datasets (already decontaminated with benchmarks).

Evaluation

To check out our evaluation results: EvalPlus

Requirements

It should work with the same requirements as DeepSeek-Coder-6.7B or the following packages:

tokenizers>=0.14.0
transformers>=4.35.0
accelerate
sympy>=1.12
pebble 
timeout-decorator 
attrdict

QuickStart

TBD

Limits

Acknowledgements

WizardCoder: WizardLM-Evol-Instruct V2 datasets
- We used a back-up(Leon-Leee/wizardlm_evol_instruct_v2_196K_backuped) since this dataset has been deleted.
Magicoder: Magicoder-Evol-Instruct-110K from theblackcat102/evol-codealpaca-v1(https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1)
Eurus: reasoning enhancement dataset of openbmb/UltraInteract_sft
OpenCoderInterpreter: m-a-p/Code-Feedback