davidhornshaw
/

pythia_160m_eval

@@ -1,199 +1,162 @@
 ---
-library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+language:
+- en
+tags:
+- pytorch
+- causal-lm
+- pythia
+license: apache-2.0
+datasets:
+- EleutherAI/pile
 ---
 # Model Card for Model ID
+The Pythia 160m model is part of a collection of models developed to facilitate
+interpretability research [(see repository)](https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md) trained on the Pile. We have evalutated it on hellaswag using the Eleuther evaluation harness.
 ## Model Details
+- Developed by: [EleutherAI](http://eleuther.ai)
+- Model type: Transformer-based Language Model
+- Language: English
+- Learn more: [Pythia's GitHub repository](https://github.com/EleutherAI/pythia)
+ for training procedure, config files, and details on how to use.
+ [See paper](https://arxiv.org/pdf/2304.01373.pdf) for more evals and implementation
+ details.
+- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
+- License: Apache 2.0
+- Contact: to ask questions about this model, join the [EleutherAI
+Discord](https://discord.gg/zBGx3azzUn), and post them in `#release-discussion`.
+ Please read the existing *Pythia* documentation before asking about it in the
+ EleutherAI Discord. For general correspondence: [contact@eleuther.
+ ai](mailto:[email protected]).
+<figure>
+| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate         | Equivalent Models      |
+| -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: |
+| 160M         | 85,056,000           | 12     | 768       | 12    | 2M         | 6.0 x 10<sup>-4</sup> | GPT-Neo 125M, OPT-125M |
+<figcaption>Engineering details for the <i>Pythia Suite</i>. Deduped and
+non-deduped models of a given size have the same hyperparameters. “Equivalent”
+models have <b>exactly</b> the same architecture, and the same number of
+non-embedding parameters.</figcaption>
+</figure>
+### Model Description
+This is the model card of Pythia 160m evaluated on the Eleuther evaluation harness.
+- **Developed by:** [EleutherAI](http://eleuther.ai)
+- **Model type:** Pythia 160m
+- **Language(s) (NLP):** EN
+- **License:** Apache 2.0
+### Model Sources
+- **Repository:** https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md
+## Uses and Limitations
+### Intended Use
+The primary intended use of Pythia is research on the behavior, functionality,
+and limitations of large language models. This suite is intended to provide
+a controlled setting for performing scientific experiments. We also provide
+154 checkpoints per model: initial `step0`, 10 log-spaced checkpoints
+`step{1,2,4...512}`, and 143 evenly-spaced checkpoints from `step1000` to
+`step143000`. These checkpoints are hosted on Hugging Face as branches. Note
+that branch `143000` corresponds exactly to the model checkpoint on the `main`
+branch of each model.
+You may also further fine-tune and adapt Pythia-160M for deployment,
+as long as your use is in accordance with the Apache 2.0 license. Pythia
+models work with the Hugging Face [Transformers
+Library](https://huggingface.co/docs/transformers/index). If you decide to use
+pre-trained Pythia-160M as a basis for your fine-tuned model, please
+conduct your own risk and bias assessment.
+### Out-of-scope use
+The Pythia Suite is **not** intended for deployment. It is not a in itself
+a product and cannot be used for human-facing interactions. For example,
+the model may generate harmful or offensive text. Please evaluate the risks
+associated with your particular use case.
+Pythia models are English-language only, and are not suitable for translation
+or generating text in other languages.
+Pythia-160M has not been fine-tuned for downstream contexts in which
+language models are commonly deployed, such as writing genre prose,
+or commercial chatbots. This means Pythia-160M will **not**
+respond to a given prompt the way a product like ChatGPT does. This is because,
+ unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
+Learning from Human Feedback (RLHF) to better “follow” human instructions.
+### Limitations and biases
+The core functionality of a large language model is to take a string of text
+and predict the next token. The token used by the model need not produce the
+most “accurate” text. Never rely on Pythia-160M to produce factually accurate
+output.
+This model was trained on [the Pile](https://pile.eleuther.ai/), a dataset
+known to contain profanity and texts that are lewd or otherwise offensive.
+See [Section 6 of the Pile paper](https://arxiv.org/abs/2101.00027) for a
+discussion of documented biases with regards to gender, religion, and race.
+Pythia-160M may produce socially unacceptable or undesirable text, *even if*
+the prompt itself does not include anything explicitly offensive.
+If you plan on using text generated through, for example, the Hosted Inference
+API, we recommend having a human curate the outputs of this language model
+before presenting it to other people. Please inform your audience that the
+text was generated by Pythia-160M.
+## Training
+### Training data
+[The Pile](https://pile.eleuther.ai/) is a 825GiB general-purpose dataset in
+English. It was created by EleutherAI specifically for training large language
+models. It contains texts from 22 diverse sources, roughly broken down into
+five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl),
+prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and
+miscellaneous (e.g. GitHub, Enron Emails). See [the Pile
+paper](https://arxiv.org/abs/2101.00027) for a breakdown of all data sources,
+methodology, and a discussion of ethical implications. Consult [the
+datasheet](https://arxiv.org/abs/2201.07311) for more detailed documentation
+about the Pile and its component datasets. The Pile can be downloaded from
+the [official website](https://pile.eleuther.ai/), or from a [community
+mirror](https://the-eye.eu/public/AI/pile/).<br>
+The Pile was **not** deduplicated before being used to train Pythia-160M.
+### Training procedure
+All models were trained on the exact same data, in the exact same order. Each
+model saw 299,892,736,000 tokens during training, and 143 checkpoints for each
+model are saved every 2,097,152,000 tokens, spaced evenly throughout training,
+from `step1000` to `step143000` (which is the same as `main`). In addition, we
+also provide frequent early checkpoints: `step0` and `step{1,2,4...512}`.
+This corresponds to training for just under 1 epoch on the Pile for
+non-deduplicated models, and about 1.5 epochs on the deduplicated Pile.
+All *Pythia* models trained for 143000 steps at a batch size
+of 2M (2,097,152 tokens).<br>
+See [GitHub](https://github.com/EleutherAI/pythia) for more details on training
+ procedure, including [how to reproduce
+ it](https://github.com/EleutherAI/pythia/blob/main/README.md#reproducing-training).<br>
+Pythia uses the same tokenizer as [GPT-NeoX-
+20B](https://huggingface.co/EleutherAI/gpt-neox-20b).
 ## Evaluation
+This model has been evaluated on hellaswag using the Eleuther evaluation harness.
+<figure>
+|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|---------|------:|------|-----:|--------|---|-----:|---|-----:|
+|hellaswag|      1|none  |     0|acc     |↑  |0.2872|±  |0.0045|
+|         |       |none  |     0|acc_norm|↑  |0.3082|±  |0.0046|
+<figcaption>Evaluation results.</figcaption>
+</figure>