Text Generation
Transformers
Safetensors
English
Chinese
llama
conversational
text-generation-inference
Inference Endpoints
Simingh commited on
Commit
ca83a2c
·
verified ·
1 Parent(s): 282d166

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -48,10 +48,19 @@ datasets:
48
 
49
  ## 3. Datasets
50
 
 
 
 
 
 
 
 
 
 
51
  | Dataset | Num | Download |
52
  |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
53
- | OpenCoder-SFT-Stage1 | 4.21 M | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage1) |
54
- | OpenCoder-SFT-Stage2 | 375 K | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage2) |
55
 
56
 
57
 
 
48
 
49
  ## 3. Datasets
50
 
51
+ ### Pre-training
52
+ | Dataset | Size | Download |
53
+ |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
54
+ | fineweb-code-corpus | 148 GB | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-code-corpus) |
55
+ | fineweb-math-corpus | 10 GB | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-math-corpus) |
56
+
57
+
58
+ ### Post-training
59
+
60
  | Dataset | Num | Download |
61
  |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
62
+ | opencoder-sft-stage1 | 4.21 M | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage1) |
63
+ | opencoder-sft-stage2 | 375 K | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage2) |
64
 
65
 
66