File size: 1,349 Bytes
4506660 c4b1aaa 9312fd5 5723a16 4506660 c4b1aaa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: mit
datasets:
- isaiahbjork/chain-of-thought
base_model:
- Qwen/Qwen2.5-3B-Instruct
library_name: mlx
language:
- en
pipeline_tag: text-generation
---
## Model Overview
This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.
## Model Architecture
- Base Model: Qwen2.5-3B
- Model Type: Causal Language Model
- Architecture: Transformer with Rotary Position Embedding (RoPE),
SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
- Parameters: 3.09 billion
- Layers: 36
- Attention Heads: 16 for query, 2 for key and value (GQA)
## Fine-Tuning Details
- Technique: Low-Rank Adaptation (LoRA)
- Framework: MLX
- Dataset: isaiahbjork/chain-of-thought
- Dataset Size: 7,143 examples
- Iterations: 600
LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training. |