Update README.md
Browse files
README.md
CHANGED
|
@@ -207,8 +207,6 @@ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2
|
|
| 207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
| 208 |
|
| 209 |
Top 18 programming languages trained on:
|
| 210 |
-
<details>
|
| 211 |
-
<summary> Click to expand </summary>
|
| 212 |
- C
|
| 213 |
- CPP
|
| 214 |
- Java
|
|
@@ -227,7 +225,6 @@ Top 18 programming languages trained on:
|
|
| 227 |
- Python
|
| 228 |
- Jupyter-Clean
|
| 229 |
- RestructuredText
|
| 230 |
-
</details>
|
| 231 |
|
| 232 |
### Training Procedure
|
| 233 |
|
|
|
|
| 207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
| 208 |
|
| 209 |
Top 18 programming languages trained on:
|
|
|
|
|
|
|
| 210 |
- C
|
| 211 |
- CPP
|
| 212 |
- Java
|
|
|
|
| 225 |
- Python
|
| 226 |
- Jupyter-Clean
|
| 227 |
- RestructuredText
|
|
|
|
| 228 |
|
| 229 |
### Training Procedure
|
| 230 |
|