Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
-
|
2 |
library_name: transformers
|
3 |
-
model_name:
|
4 |
tags:
|
5 |
- text-generation
|
6 |
- proof
|
@@ -22,7 +22,7 @@ datasets:
|
|
22 |
base_model:
|
23 |
- prithivMLmods/SmolLM2-CoT-360M
|
24 |
pipeline_tag: text-generation
|
25 |
-
|
26 |
|
27 |
# Model Card for SmolLM2Prover
|
28 |
|
@@ -92,21 +92,23 @@ tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_gene
|
|
92 |
|
93 |
outputs = model.generate(tokenized_chat, max_new_tokens=512)
|
94 |
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
95 |
-
|
96 |
# Print only the generated part
|
97 |
print(decoded_output.split("assistant\n")[-1])
|
|
|
|
|
98 |
|
99 |
-
Training Procedure
|
100 |
The model underwent several rounds of Supervised Fine-Tuning (SFT) using TRL's SFTTrainer.
|
101 |
* Training Data: The primary dataset used was AI-MO/NuminaMath-1.5, augmented with approximately 1 million additional tokens. This data was formatted with a specific prompt structure designed to elicit step-by-step, chain-of-thought reasoning from the model.
|
102 |
* Process: The iterative SFT approach allowed for progressive refinement of the model's reasoning capabilities.
|
103 |
-
|
|
|
104 |
* Transformers: 4.56.0
|
105 |
* Pytorch: 2.8.0+cu126
|
106 |
* TRL: 0.22.2
|
107 |
* Datasets: 4.0.0
|
108 |
* Tokenizers: 0.22.0
|
109 |
-
|
|
|
110 |
This model is a versatile tool suitable for a range of applications, from everyday conversation to complex problem-solving.
|
111 |
* Primary Use Cases (Specialized Skills):
|
112 |
* Educational tools for higher-level mathematics and logic.
|
@@ -121,9 +123,9 @@ Limitations and Bias
|
|
121 |
* Mathematical Accuracy: While highly capable, the model can still make errors or "hallucinate" incorrect steps or solutions in complex mathematical proofs. All outputs, especially for critical applications, should be verified by a human expert.
|
122 |
* Domain Performance: The model's performance is most reliable on problems similar to its training data. While it is designed to handle higher levels of math and deep thinking, its accuracy in novel or esoteric domains should be carefully evaluated.
|
123 |
* Inherited Bias: This model inherits any biases present in the base model (SmolLM2-CoT-360M) and the training datasets.
|
124 |
-
Acknowledgements
|
125 |
You're doing great!
|
126 |
-
Citations
|
127 |
If you use TRL in your work, please cite the library:
|
128 |
@misc{vonwerra2022trl,
|
129 |
title = {{TRL: Transformer Reinforcement Learning}},
|
|
|
1 |
+
---
|
2 |
library_name: transformers
|
3 |
+
model_name: SmolLM2Prover
|
4 |
tags:
|
5 |
- text-generation
|
6 |
- proof
|
|
|
22 |
base_model:
|
23 |
- prithivMLmods/SmolLM2-CoT-360M
|
24 |
pipeline_tag: text-generation
|
25 |
+
---
|
26 |
|
27 |
# Model Card for SmolLM2Prover
|
28 |
|
|
|
92 |
|
93 |
outputs = model.generate(tokenized_chat, max_new_tokens=512)
|
94 |
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
|
95 |
# Print only the generated part
|
96 |
print(decoded_output.split("assistant\n")[-1])
|
97 |
+
```
|
98 |
+
### Training
|
99 |
|
|
|
100 |
The model underwent several rounds of Supervised Fine-Tuning (SFT) using TRL's SFTTrainer.
|
101 |
* Training Data: The primary dataset used was AI-MO/NuminaMath-1.5, augmented with approximately 1 million additional tokens. This data was formatted with a specific prompt structure designed to elicit step-by-step, chain-of-thought reasoning from the model.
|
102 |
* Process: The iterative SFT approach allowed for progressive refinement of the model's reasoning capabilities.
|
103 |
+
|
104 |
+
## Framework Versions
|
105 |
* Transformers: 4.56.0
|
106 |
* Pytorch: 2.8.0+cu126
|
107 |
* TRL: 0.22.2
|
108 |
* Datasets: 4.0.0
|
109 |
* Tokenizers: 0.22.0
|
110 |
+
|
111 |
+
### Intended Use
|
112 |
This model is a versatile tool suitable for a range of applications, from everyday conversation to complex problem-solving.
|
113 |
* Primary Use Cases (Specialized Skills):
|
114 |
* Educational tools for higher-level mathematics and logic.
|
|
|
123 |
* Mathematical Accuracy: While highly capable, the model can still make errors or "hallucinate" incorrect steps or solutions in complex mathematical proofs. All outputs, especially for critical applications, should be verified by a human expert.
|
124 |
* Domain Performance: The model's performance is most reliable on problems similar to its training data. While it is designed to handle higher levels of math and deep thinking, its accuracy in novel or esoteric domains should be carefully evaluated.
|
125 |
* Inherited Bias: This model inherits any biases present in the base model (SmolLM2-CoT-360M) and the training datasets.
|
126 |
+
### Acknowledgements
|
127 |
You're doing great!
|
128 |
+
## Citations
|
129 |
If you use TRL in your work, please cite the library:
|
130 |
@misc{vonwerra2022trl,
|
131 |
title = {{TRL: Transformer Reinforcement Learning}},
|