octogeex / README.md
Muennighoff's picture
Update README.md
0d03e09
|
raw
history blame
8.37 kB
metadata
pipeline_tag: text-generation
inference: true
widget:
  - text: 'def print_hello_world():'
    example_title: Hello world
    group: Python
datasets:
  - bigcode/commitpackft
  - bigcode/oasst-octopack
metrics:
  - code_eval
library_name: transformers
language:
  - zh
  - en
tags:
  - codegeex
  - glm
  - chatglm
model-index:
  - name: OctoGeeX
    results:
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Python
        metrics:
          - name: pass@1
            type: pass@1
            value: 44.7
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize JavaScript
        metrics:
          - name: pass@1
            type: pass@1
            value: 33.8
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Java
        metrics:
          - name: pass@1
            type: pass@1
            value: 36.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Go
        metrics:
          - name: pass@1
            type: pass@1
            value: 21.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize C++
        metrics:
          - name: pass@1
            type: pass@1
            value: 32.3
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Rust
        metrics:
          - name: pass@1
            type: pass@1
            value: 25.7
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Average
        metrics:
          - name: pass@1
            type: pass@1
            value: 30.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Python
        metrics:
          - name: pass@1
            type: pass@1
            value: 28.1
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix JavaScript
        metrics:
          - name: pass@1
            type: pass@1
            value: 27.7
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Java
        metrics:
          - name: pass@1
            type: pass@1
            value: 30.4
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Go
        metrics:
          - name: pass@1
            type: pass@1
            value: 27.6
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix C++
        metrics:
          - name: pass@1
            type: pass@1
            value: 22.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Rust
        metrics:
          - name: pass@1
            type: pass@1
            value: 9.6
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalFix Average
        metrics:
          - name: pass@1
            type: pass@1
            value: 24.4
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain Python
        metrics:
          - name: pass@1
            type: pass@1
            value: 30.4
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain JavaScript
        metrics:
          - name: pass@1
            type: pass@1
            value: 24
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain Java
        metrics:
          - name: pass@1
            type: pass@1
            value: 24.7
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain Go
        metrics:
          - name: pass@1
            type: pass@1
            value: 21.7
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain C++
        metrics:
          - name: pass@1
            type: pass@1
            value: 21
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain Rust
        metrics:
          - name: pass@1
            type: pass@1
            value: 15.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalExplain Average
        metrics:
          - name: pass@1
            type: pass@1
            value: 22.9
            verified: false

Octopack

Table of Contents

  1. Model Summary
  2. Use
  3. Training
  4. Citation

Model Summary

OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning CodeGeeX2 on CommitPackFT & OASST as described in the OctoPack paper.

  • Repository: bigcode/octopack
  • Paper: TODO
  • Languages: 100+ Programming languages
  • OctoPack🐙🎒:
    Data CommitPack 4TB of GitHub commits across 350 programming languages
    CommitPackFT Filtered version of CommitPack for high-quality commit messages that resemble instructions
    Model OctoCoder StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST
    OctoGeeX CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST
    Evaluation   HumanEvalPack Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages

Use

Intended use

The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

Feel free to share your generations in the Community tab!

Generation

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/octogeex"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Training

Model

  • Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
  • Steps: 250k pretraining & 30 instruction tuning
  • Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
  • Precision: bfloat16

Hardware

  • Pretraining:
    • GPUs: 512 Tesla A100
    • Training time: 24 days
  • Instruction tuning:
    • GPUs: 8 Tesla A100
    • Training time: 4 hours

Software

License

本仓库的代码依照 Apache-2.0 协议开源,模型的权重的使用则需要遵循 Model License

The code in this repository is open-source under the MIT license. The model weights are licensed under the Model License.

Citation

TODO