ericflo commited on
Commit
4b6540c
·
verified ·
1 Parent(s): 7523af2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -3
README.md CHANGED
@@ -1,3 +1,98 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-7B
5
+ library_name: transformers
6
+ datasets:
7
+ - ericflo/Qwen2.5-7B-Base-Think-SFT
8
+ - ericflo/Qwen2.5-7B-Base-Think-KTO
9
+ ---
10
+
11
+ # Qwen2.5-Think-KTO v0.2: A Reasoning-Enhanced Language Model
12
+
13
+ **NOTE**: This model shows improved reliability in outputting `<think>...</think>` tags compared to v0.1, though may still require occasional prompting.
14
+
15
+ ## What's New in v0.2
16
+ This release introduces a two-stage training process, combining Supervised Fine-Tuning (SFT) with Kahneman-Tversky Optimization (KTO). The training dataset has been expanded to nearly 500 datapoints, more than doubling the previous version's training data. This approach aims to provide more robust and consistent reasoning capabilities.
17
+
18
+ ## How It Works
19
+ The model generates responses using a simple thought-then-answer format:
20
+
21
+ ```
22
+ <think>
23
+ Let me approach this step by step...
24
+ First, we need to consider X...
25
+ Then, looking at Y...
26
+ Finally, Z leads us to...
27
+ </think>
28
+
29
+ [final answer based on thought process]
30
+ ```
31
+
32
+ ## Technical Details
33
+
34
+ ### Base Architecture
35
+ - **Base Model**: Qwen2.5-7B
36
+ - **Training Approach**: Two-stage process (SFT followed by KTO)
37
+ - **Dataset**:
38
+ - Stage 1: Direct supervision with expert demonstrations
39
+ - Stage 2: Binary feedback signals (desirable/undesirable outputs)
40
+ - **Quality Control**: Programmatic validation and human review
41
+
42
+ ### Training Parameters
43
+ - **Optimization**:
44
+ - Learning Rate: 5e-6
45
+ - Scheduler: Cosine with 0.1 warmup ratio
46
+ - Optimizer: AdamW 8-bit
47
+ - Batch Size: 5 per device
48
+ - Gradient Accumulation Steps: 1
49
+ - Number of Epochs: 3 (both stages)
50
+ - **Model Config**:
51
+ - Max Length: 3746
52
+ - Max Prompt Length: 364
53
+ - Attention Implementation: Flash Attention 2
54
+ - Gradient Checkpointing: Enabled
55
+ - **Infrastructure**:
56
+ - Accelerate for distributed training
57
+ - Wandb logging
58
+ - LIGER optimization enabled
59
+
60
+ ## What's It Good For?
61
+ ✅ Tasks requiring natural thought processes
62
+ ✅ Scenarios where binary feedback is available
63
+ ✅ Problems benefiting from human-like reasoning
64
+ ✅ Applications needing clear thought-to-answer progression
65
+ ✅ More reliable reasoning tag generation
66
+
67
+ ## Limitations
68
+ - Bounded by base Qwen2.5-7B capabilities
69
+ - May not generalize beyond training distribution
70
+ - Still room for improvement in consistency
71
+ - Performance on non-reasoning tasks unchanged
72
+ - Limited by quality of training data
73
+
74
+ ## Example Usage
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+
78
+ model = AutoModelForCausalLM.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.2")
79
+ tokenizer = AutoTokenizer.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.2")
80
+
81
+ prompt = "What are the implications of Moore's Law slowing down?"
82
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
83
+ output = model.generate(input_ids, max_length=512)
84
+ response = tokenizer.decode(output[0])
85
+ ```
86
+
87
+ ## Citation
88
+ ```bibtex
89
+ @misc{qwen25-think-kto,
90
+ title={Qwen2.5-Think-KTO: Enhanced Reasoning Through Two-Stage Learning},
91
+ author={[Eric Florenzano]},
92
+ year={2024},
93
+ howpublished={\url{https://huggingface.co/ericflo/Qwen2.5-Think-KTO-v0.2}}
94
+ }
95
+ ```
96
+
97
+ ## Acknowledgments
98
+ This model builds on the Qwen2.5-7B base model and implements the KTO approach developed by Ethayarajh et al. Special thanks to the authors of the KTO paper and the broader AI research community for their contributions to model alignment techniques.