mrm8488 commited on
Commit
99b9470
·
1 Parent(s): 5bed5c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -5,12 +5,17 @@ tags:
5
  model-index:
6
  - name: santacoder-finetuned-the-stack-swift
7
  results: []
 
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
  should probably proofread and complete it, then remove this comment. -->
12
 
13
- # santacoder-finetuned-the-stack-swift
14
 
15
  This model is a fine-tuned version of [bigcode/santacoder](https://huggingface.co/bigcode/santacoder) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
@@ -18,7 +23,9 @@ It achieves the following results on the evaluation set:
18
 
19
  ## Model description
20
 
21
- More information needed
 
 
22
 
23
  ## Intended uses & limitations
24
 
@@ -26,7 +33,7 @@ More information needed
26
 
27
  ## Training and evaluation data
28
 
29
- More information needed
30
 
31
  ## Training procedure
32
 
 
5
  model-index:
6
  - name: santacoder-finetuned-the-stack-swift
7
  results: []
8
+ datasets:
9
+ - bigcode/the-stack-dedup
10
+ language:
11
+ - code
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
+ # SantaCoder 🎅 fine-tuned on Swift 🍏
19
 
20
  This model is a fine-tuned version of [bigcode/santacoder](https://huggingface.co/bigcode/santacoder) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
 
23
 
24
  ## Model description
25
 
26
+ The [SantaCoder](https://huggingface.co/bigcode/santacoder) models are a series of 1.1B parameter models trained on the Python, Java, and JavaScript subset of [The Stack (v1.1)](https://huggingface.co/datasets/bigcode/the-stack) (which excluded opt-out requests).
27
+ The main model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), was trained using near-deduplication and comment-to-code ratio as filtering criteria and using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255).
28
+ In addition, there are several models that were trained on datasets with different filter parameters and with architecture and objective variations.
29
 
30
  ## Intended uses & limitations
31
 
 
33
 
34
  ## Training and evaluation data
35
 
36
+ The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the [BigCode Project](https://www.bigcode-project.org/), an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs). The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets. **This is the near-deduplicated version with 3TB data.**
37
 
38
  ## Training procedure
39