File size: 1,349 Bytes
4506660
 
c4b1aaa
 
 
 
9312fd5
5723a16
 
 
4506660
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b1aaa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: mit
datasets:
- isaiahbjork/chain-of-thought
base_model:
- Qwen/Qwen2.5-3B-Instruct
library_name: mlx
language:
- en
pipeline_tag: text-generation
---

## Model Overview

This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.


## Model Architecture

 - Base Model: Qwen2.5-3B
 - Model Type: Causal Language Model
 - Architecture: Transformer with Rotary Position Embedding (RoPE),
   SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
 - Parameters: 3.09 billion
 - Layers: 36
 - Attention Heads: 16 for query, 2 for key and value (GQA)

## Fine-Tuning Details

 - Technique: Low-Rank Adaptation (LoRA)
 - Framework: MLX
 - Dataset: isaiahbjork/chain-of-thought
 - Dataset Size: 7,143 examples
 - Iterations: 600

LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.