reaperdoesntknow commited on
Commit
03dfd0c
·
verified ·
1 Parent(s): 51c60f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -43
README.md CHANGED
@@ -1,77 +1,137 @@
1
- ---
2
  library_name: transformers
3
  model_name: SmolLM2_Thinks
4
  tags:
5
- - generated_from_trainer
6
- - sft
7
- - trl
8
  - proof
9
  - cot
10
  - reasoning
11
- - symbioticai
12
  - calculus
13
  - logic
14
- - SFT
15
- - TRL
16
- - transformers
17
- - datasets
18
  - finetune
19
- licence: license
20
- datasets:
21
- - AI-MO/NuminaMath-1.5
22
  language:
23
  - en
 
 
 
24
  base_model:
25
  - prithivMLmods/SmolLM2-CoT-360M
26
  pipeline_tag: text-generation
27
  ---
28
 
29
- # Model Card for SmolLM2_Thinks
30
 
31
- This model is a fine-tuned version of [prithivMLmods/SmolLM2-CoT-360M](https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M).
32
- It has been trained using on multiple rounds of [TRL](https://github.com/huggingface/trl).
33
 
34
- ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ```python
37
  from transformers import pipeline
 
38
 
39
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
40
- generator = pipeline("text-generation", model="reaperdoesntknow/SMOLM2Prover", device="cuda")
41
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
42
- print(output["generated_text"])
43
 
44
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
- tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SMOLM2Prover")
47
- model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SMOLM2Prover")
48
- ```
49
 
 
 
 
 
50
 
51
- ### Framework versions
52
 
53
- - TRL: 0.22.2
54
- - Transformers: 4.56.0
55
- - Pytorch: 2.8.0+cu126
56
- - Datasets: 4.0.0
57
- - Tokenizers: 0.22.0
58
 
59
- ## Acknowledgements
60
- - I acknowledge you!
 
 
 
 
61
 
62
- ## Citations
 
 
 
 
63
 
 
 
64
 
 
 
65
 
66
- Cite TRL as:
67
-
68
- ```bibtex
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  @misc{vonwerra2022trl,
70
- title = {{TRL: Transformer Reinforcement Learning}},
71
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
72
- year = 2020,
73
- journal = {GitHub repository},
74
- publisher = {GitHub},
75
- howpublished = {\url{https://github.com/huggingface/trl}}
76
  }
77
- ```
 
 
1
+ '''
2
  library_name: transformers
3
  model_name: SmolLM2_Thinks
4
  tags:
5
+ - text-generation
 
 
6
  - proof
7
  - cot
8
  - reasoning
9
+ - math
10
  - calculus
11
  - logic
12
+ - sft
13
+ - trl
14
+ - generated_from_trainer
 
15
  - finetune
16
+ - symbioticai
 
 
17
  language:
18
  - en
19
+ license: apache-2.0
20
+ datasets:
21
+ - AI-MO/NuminaMath-1.5
22
  base_model:
23
  - prithivMLmods/SmolLM2-CoT-360M
24
  pipeline_tag: text-generation
25
  ---
26
 
27
+ # Model Card for SmolLM2Prover
28
 
29
+ **SmolLM2Prover** is a specialized, fine-tuned version of [prithivMLmods/SmolLM2-CoT-360M](https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M). While retaining the strong conversational abilities of its base model, this version has been specifically enhanced to excel at deep thinking, logical reasoning, and higher-level mathematics, with a focus on generating step-by-step proofs and explanations (Chain-of-Thought).
 
30
 
31
+ The model was fine-tuned using multiple rounds of Supervised Fine-Tuning (SFT) with the [TRL](https://github.com/huggingface/trl) library on a curated dataset, enhancing its ability to follow complex instructions and reason through problems.
32
+
33
+ ## Model Details
34
+
35
+ * **Base Model:** [prithivMLmods/SmolLM2-CoT-360M](https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M)
36
+ * **Fine-tuning Library:** TRL (Transformer Reinforcement Learning)
37
+ * **Specialization:** Mathematical reasoning, proof generation, Chain-of-Thought (CoT)
38
+ * **Training Data:** Fine-tuned on `AI-MO/NuminaMath-1.5` and an additional ~1 million tokens of custom-formatted reasoning data.
39
+
40
+ ## How to Use
41
+
42
+ This model is intended to be used for text generation tasks that require logical reasoning or advanced conversation.
43
+
44
+ ### Using the Pipeline
45
+
46
+ The easiest way to use the model is with the `transformers` pipeline.
47
 
48
  ```python
49
  from transformers import pipeline
50
+ import torch
51
 
 
 
 
 
52
 
53
+ model_id = "reaperdoesntknow/SMOLM2Prover"
54
+ prompt = "Prove that the derivative of f(x) = x^2 is f'(x) = 2x using the limit definition of a derivative."
55
+
56
+ generator = pipeline(
57
+ "text-generation",
58
+ model=model_id,
59
+ torch_dtype=torch.bfloat16, # Or torch.float16 if bfloat16 is not available
60
+ device_map="auto"
61
+ )
62
+
63
+ # Using a chat format for better instruction following
64
+ messages = [
65
+ {"role": "user", "content": f"You are a helpful math assistant. Please solve the following problem step-by-step.\n\n{prompt}"}
66
+ ]
67
 
68
+ output = generator(messages, max_new_tokens=512, return_full_text=False)
69
+ print(output[0]["generated_text"])
 
70
 
71
+ Manual Usage
72
+ For more control, you can use AutoModelForCausalLM and AutoTokenizer directly.
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM
74
+ import torch
75
 
 
76
 
77
+ model_id = "reaperdoesntknow/SMOLM2Prover"
78
+ prompt = "Prove that the derivative of f(x) = x^2 is f'(x) = 2x using the limit definition of a derivative."
 
 
 
79
 
80
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
81
+ model = AutoModelForCausalLM.from_pretrained(
82
+ model_id,
83
+ torch_dtype=torch.bfloat16, # Or torch.float16
84
+ device_map="auto"
85
+ )
86
 
87
+ # Apply the chat template for proper formatting
88
+ messages = [
89
+ {"role": "user", "content": f"You are a helpful math assistant. Please solve the following problem step-by-step.\n\n{prompt}"}
90
+ ]
91
+ tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
92
 
93
+ outputs = model.generate(tokenized_chat, max_new_tokens=512)
94
+ decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
95
 
96
+ # Print only the generated part
97
+ print(decoded_output.split("assistant\n")[-1])
98
 
99
+ Training Procedure
100
+ The model underwent several rounds of Supervised Fine-Tuning (SFT) using TRL's SFTTrainer.
101
+ * Training Data: The primary dataset used was AI-MO/NuminaMath-1.5, augmented with approximately 1 million additional tokens. This data was formatted with a specific prompt structure designed to elicit step-by-step, chain-of-thought reasoning from the model.
102
+ * Process: The iterative SFT approach allowed for progressive refinement of the model's reasoning capabilities.
103
+ Framework Versions
104
+ * Transformers: 4.56.0
105
+ * Pytorch: 2.8.0+cu126
106
+ * TRL: 0.22.2
107
+ * Datasets: 4.0.0
108
+ * Tokenizers: 0.22.0
109
+ Intended Use
110
+ This model is a versatile tool suitable for a range of applications, from everyday conversation to complex problem-solving.
111
+ * Primary Use Cases (Specialized Skills):
112
+ * Educational tools for higher-level mathematics and logic.
113
+ * Automated proof generation and verification.
114
+ * Step-by-step problem-solving assistants for complex topics.
115
+ * Serving as a "thinking" component for applications requiring deep reasoning.
116
+ * General Use Cases:
117
+ * General-purpose conversation and advanced chatbot applications.
118
+ * Complex instruction-following tasks.
119
+ * Content generation that requires logical consistency.
120
+ Limitations and Bias
121
+ * Mathematical Accuracy: While highly capable, the model can still make errors or "hallucinate" incorrect steps or solutions in complex mathematical proofs. All outputs, especially for critical applications, should be verified by a human expert.
122
+ * Domain Performance: The model's performance is most reliable on problems similar to its training data. While it is designed to handle higher levels of math and deep thinking, its accuracy in novel or esoteric domains should be carefully evaluated.
123
+ * Inherited Bias: This model inherits any biases present in the base model (SmolLM2-CoT-360M) and the training datasets.
124
+ Acknowledgements
125
+ You're doing great!
126
+ Citations
127
+ If you use TRL in your work, please cite the library:
128
  @misc{vonwerra2022trl,
129
+ title = {{TRL: Transformer Reinforcement Learning}},
130
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
131
+ year = 2020,
132
+ journal = {GitHub repository},
133
+ publisher = {GitHub},
134
+ howpublished = {\url{[https://github.com/huggingface/trl](https://github.com/huggingface/trl)}}
135
  }
136
+
137
+