aya-se commited on
Commit
2b75263
·
verified ·
1 Parent(s): b6aa8bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -113,7 +113,7 @@ The following datasets were used for continual pre-training.
113
  - [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
114
  - [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
115
  - [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
116
- - [Swallow Corpus Version 2](https://arxiv.org/abs/2404.17733)
117
  - [The-stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train-smol-ids)
118
 
119
  ## Risks and Limitations
 
113
  - [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
114
  - [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
115
  - [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
116
+ - [Swallow Corpus Version 2](https://arxiv.org/abs/2404.17733) (filtered using [Swallow Education Classifier(Wiki-based)](https://huggingface.co/tokyotech-llm/edu-classifier))
117
  - [The-stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train-smol-ids)
118
 
119
  ## Risks and Limitations