Safetensors
PyTorch
English
gpt_neox
causal-lm
pythia
davidhornshaw commited on
Commit
147bb29
·
verified ·
1 Parent(s): 6ae2998

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -185
README.md CHANGED
@@ -1,199 +1,162 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
 
12
  ## Model Details
13
 
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - pytorch
6
+ - causal-lm
7
+ - pythia
8
+ license: apache-2.0
9
+ datasets:
10
+ - EleutherAI/pile
11
  ---
12
 
13
  # Model Card for Model ID
14
 
15
+ The Pythia 160m model is part of a collection of models developed to facilitate
16
+ interpretability research [(see repository)](https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md) trained on the Pile. We have evalutated it on hellaswag using the Eleuther evaluation harness.
 
17
 
18
  ## Model Details
19
 
20
+ - Developed by: [EleutherAI](http://eleuther.ai)
21
+ - Model type: Transformer-based Language Model
22
+ - Language: English
23
+ - Learn more: [Pythia's GitHub repository](https://github.com/EleutherAI/pythia)
24
+ for training procedure, config files, and details on how to use.
25
+ [See paper](https://arxiv.org/pdf/2304.01373.pdf) for more evals and implementation
26
+ details.
27
+ - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
28
+ - License: Apache 2.0
29
+ - Contact: to ask questions about this model, join the [EleutherAI
30
+ Discord](https://discord.gg/zBGx3azzUn), and post them in `#release-discussion`.
31
+ Please read the existing *Pythia* documentation before asking about it in the
32
+ EleutherAI Discord. For general correspondence: [contact@eleuther.
33
+ ai](mailto:[email protected]).
34
+
35
+ <figure>
36
+
37
+ | Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate | Equivalent Models |
38
+ | -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: |
39
+ | 160M | 85,056,000 | 12 | 768 | 12 | 2M | 6.0 x 10<sup>-4</sup> | GPT-Neo 125M, OPT-125M |
40
+ <figcaption>Engineering details for the <i>Pythia Suite</i>. Deduped and
41
+ non-deduped models of a given size have the same hyperparameters. “Equivalent”
42
+ models have <b>exactly</b> the same architecture, and the same number of
43
+ non-embedding parameters.</figcaption>
44
+ </figure>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
+ ### Model Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ This is the model card of Pythia 160m evaluated on the Eleuther evaluation harness.
49
+
50
+ - **Developed by:** [EleutherAI](http://eleuther.ai)
51
+ - **Model type:** Pythia 160m
52
+ - **Language(s) (NLP):** EN
53
+ - **License:** Apache 2.0
54
+
55
+ ### Model Sources
56
+
57
+ - **Repository:** https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md
58
+
59
+ ## Uses and Limitations
60
+
61
+ ### Intended Use
62
+
63
+ The primary intended use of Pythia is research on the behavior, functionality,
64
+ and limitations of large language models. This suite is intended to provide
65
+ a controlled setting for performing scientific experiments. We also provide
66
+ 154 checkpoints per model: initial `step0`, 10 log-spaced checkpoints
67
+ `step{1,2,4...512}`, and 143 evenly-spaced checkpoints from `step1000` to
68
+ `step143000`. These checkpoints are hosted on Hugging Face as branches. Note
69
+ that branch `143000` corresponds exactly to the model checkpoint on the `main`
70
+ branch of each model.
71
+
72
+ You may also further fine-tune and adapt Pythia-160M for deployment,
73
+ as long as your use is in accordance with the Apache 2.0 license. Pythia
74
+ models work with the Hugging Face [Transformers
75
+ Library](https://huggingface.co/docs/transformers/index). If you decide to use
76
+ pre-trained Pythia-160M as a basis for your fine-tuned model, please
77
+ conduct your own risk and bias assessment.
78
+
79
+ ### Out-of-scope use
80
+
81
+ The Pythia Suite is **not** intended for deployment. It is not a in itself
82
+ a product and cannot be used for human-facing interactions. For example,
83
+ the model may generate harmful or offensive text. Please evaluate the risks
84
+ associated with your particular use case.
85
+
86
+ Pythia models are English-language only, and are not suitable for translation
87
+ or generating text in other languages.
88
+
89
+ Pythia-160M has not been fine-tuned for downstream contexts in which
90
+ language models are commonly deployed, such as writing genre prose,
91
+ or commercial chatbots. This means Pythia-160M will **not**
92
+ respond to a given prompt the way a product like ChatGPT does. This is because,
93
+ unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
94
+ Learning from Human Feedback (RLHF) to better “follow” human instructions.
95
+
96
+ ### Limitations and biases
97
+
98
+ The core functionality of a large language model is to take a string of text
99
+ and predict the next token. The token used by the model need not produce the
100
+ most “accurate” text. Never rely on Pythia-160M to produce factually accurate
101
+ output.
102
+
103
+ This model was trained on [the Pile](https://pile.eleuther.ai/), a dataset
104
+ known to contain profanity and texts that are lewd or otherwise offensive.
105
+ See [Section 6 of the Pile paper](https://arxiv.org/abs/2101.00027) for a
106
+ discussion of documented biases with regards to gender, religion, and race.
107
+ Pythia-160M may produce socially unacceptable or undesirable text, *even if*
108
+ the prompt itself does not include anything explicitly offensive.
109
+
110
+ If you plan on using text generated through, for example, the Hosted Inference
111
+ API, we recommend having a human curate the outputs of this language model
112
+ before presenting it to other people. Please inform your audience that the
113
+ text was generated by Pythia-160M.
114
+
115
+ ## Training
116
+
117
+ ### Training data
118
+
119
+ [The Pile](https://pile.eleuther.ai/) is a 825GiB general-purpose dataset in
120
+ English. It was created by EleutherAI specifically for training large language
121
+ models. It contains texts from 22 diverse sources, roughly broken down into
122
+ five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl),
123
+ prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and
124
+ miscellaneous (e.g. GitHub, Enron Emails). See [the Pile
125
+ paper](https://arxiv.org/abs/2101.00027) for a breakdown of all data sources,
126
+ methodology, and a discussion of ethical implications. Consult [the
127
+ datasheet](https://arxiv.org/abs/2201.07311) for more detailed documentation
128
+ about the Pile and its component datasets. The Pile can be downloaded from
129
+ the [official website](https://pile.eleuther.ai/), or from a [community
130
+ mirror](https://the-eye.eu/public/AI/pile/).<br>
131
+ The Pile was **not** deduplicated before being used to train Pythia-160M.
132
+
133
+ ### Training procedure
134
+
135
+ All models were trained on the exact same data, in the exact same order. Each
136
+ model saw 299,892,736,000 tokens during training, and 143 checkpoints for each
137
+ model are saved every 2,097,152,000 tokens, spaced evenly throughout training,
138
+ from `step1000` to `step143000` (which is the same as `main`). In addition, we
139
+ also provide frequent early checkpoints: `step0` and `step{1,2,4...512}`.
140
+ This corresponds to training for just under 1 epoch on the Pile for
141
+ non-deduplicated models, and about 1.5 epochs on the deduplicated Pile.
142
+
143
+ All *Pythia* models trained for 143000 steps at a batch size
144
+ of 2M (2,097,152 tokens).<br>
145
+ See [GitHub](https://github.com/EleutherAI/pythia) for more details on training
146
+ procedure, including [how to reproduce
147
+ it](https://github.com/EleutherAI/pythia/blob/main/README.md#reproducing-training).<br>
148
+ Pythia uses the same tokenizer as [GPT-NeoX-
149
+ 20B](https://huggingface.co/EleutherAI/gpt-neox-20b).
150
 
151
  ## Evaluation
152
 
153
+ This model has been evaluated on hellaswag using the Eleuther evaluation harness.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
 
155
+ <figure>
156
+
157
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
158
+ |---------|------:|------|-----:|--------|---|-----:|---|-----:|
159
+ |hellaswag| 1|none | 0|acc |↑ |0.2872|± |0.0045|
160
+ | | |none | 0|acc_norm|↑ |0.3082|± |0.0046|
161
+ <figcaption>Evaluation results.</figcaption>
162
+ </figure>