Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ Important branches:
|
|
56 |
- `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct).
|
57 |
- `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks.
|
58 |
|
59 |
-
|
60 |
|
61 |
| Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande |
|
62 |
|-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------|
|
@@ -82,10 +82,7 @@ Important branches:
|
|
82 |
| Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |
|
83 |
|
84 |
# Bias, Risks, and Limitations
|
85 |
-
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
|
86 |
-
|
87 |
-
Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked.
|
88 |
-
|
89 |
|
90 |
# Citation
|
91 |
|
|
|
56 |
- `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct).
|
57 |
- `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks.
|
58 |
|
59 |
+
# Evaluation Snapshot
|
60 |
|
61 |
| Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande |
|
62 |
|-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------|
|
|
|
82 |
| Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |
|
83 |
|
84 |
# Bias, Risks, and Limitations
|
85 |
+
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology. Otherwise, many facts from OLMoE or any LLM will often not be true, so they should be checked.
|
|
|
|
|
|
|
86 |
|
87 |
# Citation
|
88 |
|