Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
Muennighoff commited on
Commit
cd7d397
·
verified ·
1 Parent(s): e4c66e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -56,7 +56,7 @@ Important branches:
56
  - `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct).
57
  - `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks.
58
 
59
- ### Evaluation Snapshot
60
 
61
  | Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande |
62
  |-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------|
@@ -82,10 +82,7 @@ Important branches:
82
  | Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |
83
 
84
  # Bias, Risks, and Limitations
85
- Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
86
-
87
- Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked.
88
-
89
 
90
  # Citation
91
 
 
56
  - `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct).
57
  - `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks.
58
 
59
+ # Evaluation Snapshot
60
 
61
  | Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande |
62
  |-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------|
 
82
  | Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |
83
 
84
  # Bias, Risks, and Limitations
85
+ Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology. Otherwise, many facts from OLMoE or any LLM will often not be true, so they should be checked.
 
 
 
86
 
87
  # Citation
88