ericflo
/

Qwen2.5-7B-Think-KTO-v0.2

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-7B
+library_name: transformers
+datasets:
+- ericflo/Qwen2.5-7B-Base-Think-SFT
+- ericflo/Qwen2.5-7B-Base-Think-KTO
+---
+# Qwen2.5-Think-KTO v0.2: A Reasoning-Enhanced Language Model
+**NOTE**: This model shows improved reliability in outputting `<think>...</think>` tags compared to v0.1, though may still require occasional prompting.
+## What's New in v0.2
+This release introduces a two-stage training process, combining Supervised Fine-Tuning (SFT) with Kahneman-Tversky Optimization (KTO). The training dataset has been expanded to nearly 500 datapoints, more than doubling the previous version's training data. This approach aims to provide more robust and consistent reasoning capabilities.
+## How It Works
+The model generates responses using a simple thought-then-answer format:
+```
+<think>
+Let me approach this step by step...
+First, we need to consider X...
+Then, looking at Y...
+Finally, Z leads us to...
+</think>
+[final answer based on thought process]
+```
+## Technical Details
+### Base Architecture
+- **Base Model**: Qwen2.5-7B
+- **Training Approach**: Two-stage process (SFT followed by KTO)
+- **Dataset**:
+  - Stage 1: Direct supervision with expert demonstrations
+  - Stage 2: Binary feedback signals (desirable/undesirable outputs)
+- **Quality Control**: Programmatic validation and human review
+### Training Parameters
+- **Optimization**:
+  - Learning Rate: 5e-6
+  - Scheduler: Cosine with 0.1 warmup ratio
+  - Optimizer: AdamW 8-bit
+  - Batch Size: 5 per device
+  - Gradient Accumulation Steps: 1
+  - Number of Epochs: 3 (both stages)
+- **Model Config**:
+  - Max Length: 3746
+  - Max Prompt Length: 364
+  - Attention Implementation: Flash Attention 2
+  - Gradient Checkpointing: Enabled
+- **Infrastructure**:
+  - Accelerate for distributed training
+  - Wandb logging
+  - LIGER optimization enabled
+## What's It Good For?
+✅ Tasks requiring natural thought processes
+✅ Scenarios where binary feedback is available
+✅ Problems benefiting from human-like reasoning
+✅ Applications needing clear thought-to-answer progression
+✅ More reliable reasoning tag generation
+## Limitations
+- Bounded by base Qwen2.5-7B capabilities
+- May not generalize beyond training distribution
+- Still room for improvement in consistency
+- Performance on non-reasoning tasks unchanged
+- Limited by quality of training data
+## Example Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.2")
+tokenizer = AutoTokenizer.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.2")
+prompt = "What are the implications of Moore's Law slowing down?"
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+output = model.generate(input_ids, max_length=512)
+response = tokenizer.decode(output[0])
+```
+## Citation
+```bibtex
+@misc{qwen25-think-kto,
+  title={Qwen2.5-Think-KTO: Enhanced Reasoning Through Two-Stage Learning},
+  author={[Eric Florenzano]},
+  year={2024},
+  howpublished={\url{https://huggingface.co/ericflo/Qwen2.5-Think-KTO-v0.2}}
+}
+```
+## Acknowledgments
+This model builds on the Qwen2.5-7B base model and implements the KTO approach developed by Ethayarajh et al. Special thanks to the authors of the KTO paper and the broader AI research community for their contributions to model alignment techniques.