ApatheticWithoutTheA commited on
Commit
4506660
·
verified ·
1 Parent(s): 06c8f69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -3
README.md CHANGED
@@ -1,3 +1,28 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ## Model Overview
6
+
7
+ This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.
8
+
9
+
10
+ ## Model Architecture
11
+
12
+ - Base Model: Qwen2.5-3B
13
+ - Model Type: Causal Language Model
14
+ - Architecture: Transformer with Rotary Position Embedding (RoPE),
15
+ SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
16
+ - Parameters: 3.09 billion
17
+ - Layers: 36
18
+ - Attention Heads: 16 for query, 2 for key and value (GQA)
19
+
20
+ ## Fine-Tuning Details
21
+
22
+ - Technique: Low-Rank Adaptation (LoRA)
23
+ - Framework: MLX
24
+ - Dataset: isaiahbjork/chain-of-thought
25
+ - Dataset Size: 7,143 examples
26
+ - Iterations: 600
27
+
28
+ LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.