Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,28 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
## Model Overview
|
6 |
+
|
7 |
+
This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.
|
8 |
+
|
9 |
+
|
10 |
+
## Model Architecture
|
11 |
+
|
12 |
+
- Base Model: Qwen2.5-3B
|
13 |
+
- Model Type: Causal Language Model
|
14 |
+
- Architecture: Transformer with Rotary Position Embedding (RoPE),
|
15 |
+
SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
|
16 |
+
- Parameters: 3.09 billion
|
17 |
+
- Layers: 36
|
18 |
+
- Attention Heads: 16 for query, 2 for key and value (GQA)
|
19 |
+
|
20 |
+
## Fine-Tuning Details
|
21 |
+
|
22 |
+
- Technique: Low-Rank Adaptation (LoRA)
|
23 |
+
- Framework: MLX
|
24 |
+
- Dataset: isaiahbjork/chain-of-thought
|
25 |
+
- Dataset Size: 7,143 examples
|
26 |
+
- Iterations: 600
|
27 |
+
|
28 |
+
LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.
|