Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ programming_language:
|
|
| 10 |
- JavaScript
|
| 11 |
- Python
|
| 12 |
- Rust
|
| 13 |
-
-
|
| 14 |
- C++
|
| 15 |
- C
|
| 16 |
- C#
|
|
@@ -58,10 +58,10 @@ datasets:
|
|
| 58 |
- bigcode/starcoderdata
|
| 59 |
---
|
| 60 |
|
| 61 |
-
# Model Card for DeciCoder
|
| 62 |
|
| 63 |
-
DeciCoder
|
| 64 |
-
trained on the Python, Java, Javascript,
|
| 65 |
The model uses variable Grouped Query Attention and has a context window of 4096
|
| 66 |
tokens. It was trained using a Fill-in-the-Middle training objective. The model's
|
| 67 |
architecture was generated by Deci's proprietary Neural Architecture
|
|
@@ -70,10 +70,17 @@ Search-based technology, AutoNAC.
|
|
| 70 |
## Model Details
|
| 71 |
|
| 72 |
- **Developed by:** Deci
|
| 73 |
-
- **Model type:** DeciCoder is an auto-regressive language model based on the transformer decoder architecture, using variable Grouped Query Attention.
|
| 74 |
-
- **Language(s):** Python, Java, JavaScript,
|
| 75 |
- **License:** Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
## Model Architecture
|
| 78 |
|
| 79 |
| Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads | Hidden Size |
|
|
@@ -81,12 +88,12 @@ Search-based technology, AutoNAC.
|
|
| 81 |
| 6B | 32 | 32 | 4096 | Variable | 4096 | |
|
| 82 |
|
| 83 |
|
| 84 |
-
- **Decoder layer:** Variable Grouped Query Attention
|
| 85 |
- **Position Embeddings:** Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)
|
| 86 |
|
| 87 |
## Uses
|
| 88 |
|
| 89 |
-
The model is intended to
|
| 90 |
context window of up to 4096k tokens. It is *not* an instruction model
|
| 91 |
and commands like \"Write a function that computes the absolute value of
|
| 92 |
an integer,\" won't yield the desired results. A more effective approach
|
|
@@ -114,8 +121,8 @@ print(tokenizer.decode(outputs[0]))
|
|
| 114 |
|
| 115 |
### Attribution
|
| 116 |
|
| 117 |
-
DeciCoder was trained on StarCoder Training Dataset, filtered for
|
| 118 |
-
Python, Java, JavaScript,
|
| 119 |
refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata).
|
| 120 |
|
| 121 |
```
|
|
@@ -123,34 +130,28 @@ refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://hugging
|
|
| 123 |
### Limitations
|
| 124 |
|
| 125 |
The model has undergone training with source code from Python, Java,
|
| 126 |
-
JavaScript,
|
| 127 |
contain other languages. Therefore, the model can produce code snippets
|
| 128 |
-
given some context. However, there
|
| 129 |
code will function as expected. It might be suboptimal, contain bugs, or
|
| 130 |
even exploits.
|
| 131 |
|
| 132 |
## Evaluation
|
| 133 |
|
| 134 |
-
Below are DeciCoder's pass@1 on MultiPL HumanEval scores
|
| 135 |
|
| 136 |
-
| Python | JavaScript | Java | C++ | C# | Rust | Go |
|
| 137 |
-
|
| 138 |
-
| 33.
|
| 139 |
|
| 140 |
|
| 141 |
### Runtime Benchmarks
|
| 142 |
|
| 143 |
-
|Inference Tool
|
| 144 |
-
|
| 145 |
-
|
|
| 146 |
|
| 147 |
-
-
|
| 148 |
-
|
| 149 |
-
## Documentation
|
| 150 |
-
|
| 151 |
-
- [Notebook](https://colab.research.google.com/drive/1JCxvBsWCZKHfIcHSMVf7GZCs3ClMQPjs) CHANGE
|
| 152 |
-
- Blog post: [Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation](https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llm/)CHANGE
|
| 153 |
-
- Questions:Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/)CHANGE
|
| 154 |
|
| 155 |
## How to Cite
|
| 156 |
|
|
@@ -158,9 +159,9 @@ Please cite this model using this format.
|
|
| 158 |
|
| 159 |
```bibtex
|
| 160 |
@misc{DeciFoundationModels,
|
| 161 |
-
title = {DeciCoder},
|
| 162 |
author = {DeciAI Research Team},
|
| 163 |
year = {2023}
|
| 164 |
-
url={[https://huggingface.co/deci/decicoder-
|
| 165 |
}
|
| 166 |
-
```
|
|
|
|
| 10 |
- JavaScript
|
| 11 |
- Python
|
| 12 |
- Rust
|
| 13 |
+
- Ruby
|
| 14 |
- C++
|
| 15 |
- C
|
| 16 |
- C#
|
|
|
|
| 58 |
- bigcode/starcoderdata
|
| 59 |
---
|
| 60 |
|
| 61 |
+
# Model Card for DeciCoder-6B
|
| 62 |
|
| 63 |
+
DeciCoder-6B is a 6 billion parameter decoder-only code completion model
|
| 64 |
+
trained on the Python, Java, Javascript, Rust, C++, C, and C# subset of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata).
|
| 65 |
The model uses variable Grouped Query Attention and has a context window of 4096
|
| 66 |
tokens. It was trained using a Fill-in-the-Middle training objective. The model's
|
| 67 |
architecture was generated by Deci's proprietary Neural Architecture
|
|
|
|
| 70 |
## Model Details
|
| 71 |
|
| 72 |
- **Developed by:** Deci
|
| 73 |
+
- **Model type:** DeciCoder-6B is an auto-regressive language model based on the transformer decoder architecture, using variable Grouped Query Attention.
|
| 74 |
+
- **Language(s):** Python, Java, JavaScript, Ruby, Rust, C++, C, C#
|
| 75 |
- **License:** Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 76 |
|
| 77 |
+
## Documentation
|
| 78 |
+
|
| 79 |
+
- Google Colab [Notebook](https://colab.research.google.com/drive/1ZxG9qMlom9vn4lSGlD8PrjwHBvag94ei?usp=sharing)
|
| 80 |
+
- Blog Post: [Introducing DeciCoder-6B: The Best Multi-Language Code Generation LLM in Its Class](https://deci.ai/blog/decicoder-6b-the-best-multi-language-code-generation-llm-in-its-class/)
|
| 81 |
+
- Tutorial: [How to Run DeciCoder-6B on Qualcomm AI 100](https://github.com/quic/cloud-ai-sdk/tree/1.12/models/language_processing/decoder)
|
| 82 |
+
- Questions: Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/)
|
| 83 |
+
|
| 84 |
## Model Architecture
|
| 85 |
|
| 86 |
| Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads | Hidden Size |
|
|
|
|
| 88 |
| 6B | 32 | 32 | 4096 | Variable | 4096 | |
|
| 89 |
|
| 90 |
|
| 91 |
+
- **Decoder layer:** Variable Grouped Query Attention
|
| 92 |
- **Position Embeddings:** Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)
|
| 93 |
|
| 94 |
## Uses
|
| 95 |
|
| 96 |
+
The model is intended to perform single/multiline code completion from a
|
| 97 |
context window of up to 4096k tokens. It is *not* an instruction model
|
| 98 |
and commands like \"Write a function that computes the absolute value of
|
| 99 |
an integer,\" won't yield the desired results. A more effective approach
|
|
|
|
| 121 |
|
| 122 |
### Attribution
|
| 123 |
|
| 124 |
+
DeciCoder-6B was trained on StarCoder Training Dataset, filtered for
|
| 125 |
+
Python, Java, JavaScript, Ruby, RUST, C++, C, and C#. For additional information, please
|
| 126 |
refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata).
|
| 127 |
|
| 128 |
```
|
|
|
|
| 130 |
### Limitations
|
| 131 |
|
| 132 |
The model has undergone training with source code from Python, Java,
|
| 133 |
+
JavaScript, Ruby, RUST, C++, C, and C#. While the primary language in the source is English, it does
|
| 134 |
contain other languages. Therefore, the model can produce code snippets
|
| 135 |
+
given some context. However, there is no assurance that the resulting
|
| 136 |
code will function as expected. It might be suboptimal, contain bugs, or
|
| 137 |
even exploits.
|
| 138 |
|
| 139 |
## Evaluation
|
| 140 |
|
| 141 |
+
Below are DeciCoder-6B's pass@1 on MultiPL HumanEval scores
|
| 142 |
|
| 143 |
+
| Python | JavaScript | Java | C++ | C# | Rust | Go |
|
| 144 |
+
|:----------|:----------|:----------|:----------|:----------|:----------|:----------|
|
| 145 |
+
| 33.3% | 29.3% | 30.3% |29.93% |20.31% |20.5% |77.47% |
|
| 146 |
|
| 147 |
|
| 148 |
### Runtime Benchmarks
|
| 149 |
|
| 150 |
+
|Inference Tool | Hardware | Prompt Length | Generation Length | Throughput (tokens/sec) |
|
| 151 |
+
|:----------|:----------|:----------|:----------|:----------|
|
| 152 |
+
| Qualcomm SDK | Qualcomm AI 100 | 1024 | 1024 | 531.3 |
|
| 153 |
|
| 154 |
+
- Measured for maximal batch size on the device
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
## How to Cite
|
| 157 |
|
|
|
|
| 159 |
|
| 160 |
```bibtex
|
| 161 |
@misc{DeciFoundationModels,
|
| 162 |
+
title = {DeciCoder-6B},
|
| 163 |
author = {DeciAI Research Team},
|
| 164 |
year = {2023}
|
| 165 |
+
url={[https://huggingface.co/deci/decicoder-6B](https://huggingface.co/deci/decicoder-6B)},
|
| 166 |
}
|
| 167 |
+
```
|