|
--- |
|
license: mit |
|
datasets: |
|
- isaiahbjork/chain-of-thought |
|
base_model: |
|
- Qwen/Qwen2.5-3B-Instruct |
|
library_name: mlx |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
## Model Overview |
|
|
|
This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving. |
|
|
|
|
|
## Model Architecture |
|
|
|
- Base Model: Qwen2.5-3B |
|
- Model Type: Causal Language Model |
|
- Architecture: Transformer with Rotary Position Embedding (RoPE), |
|
SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings |
|
- Parameters: 3.09 billion |
|
- Layers: 36 |
|
- Attention Heads: 16 for query, 2 for key and value (GQA) |
|
|
|
## Fine-Tuning Details |
|
|
|
- Technique: Low-Rank Adaptation (LoRA) |
|
- Framework: MLX |
|
- Dataset: isaiahbjork/chain-of-thought |
|
- Dataset Size: 7,143 examples |
|
- Iterations: 600 |
|
|
|
LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training. |