Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,8 @@ language:
|
|
| 6 |
|
| 7 |
# GPT-NeoXT-Chat-Base-20B
|
| 8 |
|
|
|
|
|
|
|
| 9 |
> TLDR: As part of OpenChatKit (codebase available [here](https://github.com/togethercomputer/OpenChaT)),
|
| 10 |
> GPT-NeoXT-Chat-Base-20B is a 20B parameter language model, fine-tuned from EleutherAI’s GPT-NeoX with over 40 million instructions on 100% carbon negative compute.
|
| 11 |
|
|
@@ -23,6 +25,20 @@ You can read more about this process and the availability of this dataset in LAI
|
|
| 23 |
- **Model Description**: A 20B parameter open source chat model, fine-tuned from EleutherAI’s NeoX with over 40M instructions on 100% carbon negative compute
|
| 24 |
- **Resources for more information**: [GitHub Repository](https://github.com/togethercomputer/OpenChaT).
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
## Strengths of the model
|
| 27 |
|
| 28 |
There are several tasks that OpenChatKit excels at out of the box. This includes:
|
|
@@ -160,7 +176,8 @@ We therefore welcome contributions from individuals and organizations, and encou
|
|
| 160 |
## Training
|
| 161 |
|
| 162 |
**Training Data**
|
| 163 |
-
|
|
|
|
| 164 |
|
| 165 |
**Training Procedure**
|
| 166 |
|
|
@@ -170,15 +187,4 @@ We therefore welcome contributions from individuals and organizations, and encou
|
|
| 170 |
- **Batch:** 2 x 2 x 64 x 2048 = 524288 tokens
|
| 171 |
- **Learning rate:** warmup to 1e-6 for 100 steps and then kept constant
|
| 172 |
|
| 173 |
-
## Environmental Impact
|
| 174 |
-
\[TODO\]
|
| 175 |
-
**Stable Diffusion v1** **Estimated Emissions**
|
| 176 |
-
Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.
|
| 177 |
-
|
| 178 |
-
- **Hardware Type:** A100 PCIe 40GB
|
| 179 |
-
- **Hours used:** 200000
|
| 180 |
-
- **Cloud Provider:** AWS
|
| 181 |
-
- **Compute Region:** US-east
|
| 182 |
-
- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq.
|
| 183 |
-
|
| 184 |
|
|
|
|
| 6 |
|
| 7 |
# GPT-NeoXT-Chat-Base-20B
|
| 8 |
|
| 9 |
+
***<p style="font-size: 24px">Feel free to try out our [OpenChatKit feedback app](https://huggingface.co/spaces/togethercomputer/OpenChatKit)!</p>***
|
| 10 |
+
|
| 11 |
> TLDR: As part of OpenChatKit (codebase available [here](https://github.com/togethercomputer/OpenChaT)),
|
| 12 |
> GPT-NeoXT-Chat-Base-20B is a 20B parameter language model, fine-tuned from EleutherAI’s GPT-NeoX with over 40 million instructions on 100% carbon negative compute.
|
| 13 |
|
|
|
|
| 25 |
- **Model Description**: A 20B parameter open source chat model, fine-tuned from EleutherAI’s NeoX with over 40M instructions on 100% carbon negative compute
|
| 26 |
- **Resources for more information**: [GitHub Repository](https://github.com/togethercomputer/OpenChaT).
|
| 27 |
|
| 28 |
+
# Quick Start
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from transformers import pipeline
|
| 32 |
+
pipe = pipeline(model='togethercomputer/GPT-NeoXT-Chat-Base-20B')
|
| 33 |
+
pipe('''<human>: Hello!\n<bot>:''')
|
| 34 |
+
```
|
| 35 |
+
or
|
| 36 |
+
```python
|
| 37 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 38 |
+
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
|
| 39 |
+
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
## Strengths of the model
|
| 43 |
|
| 44 |
There are several tasks that OpenChatKit excels at out of the box. This includes:
|
|
|
|
| 176 |
## Training
|
| 177 |
|
| 178 |
**Training Data**
|
| 179 |
+
|
| 180 |
+
Please refer to [togethercomputer/OpenDataHub](https://github.com/togethercomputer/OpenDataHub)
|
| 181 |
|
| 182 |
**Training Procedure**
|
| 183 |
|
|
|
|
| 187 |
- **Batch:** 2 x 2 x 64 x 2048 = 524288 tokens
|
| 188 |
- **Learning rate:** warmup to 1e-6 for 100 steps and then kept constant
|
| 189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
|