File size: 3,157 Bytes
90c25b8
c4f19ce
90c25b8
 
 
 
 
 
c4f19ce
 
 
 
90c25b8
c4f19ce
 
 
 
 
90c25b8
c4f19ce
90c25b8
c4f19ce
 
 
90c25b8
c4f19ce
 
 
90c25b8
c4f19ce
 
 
 
 
90c25b8
 
 
 
 
 
 
 
 
 
 
 
 
 
c4f19ce
 
 
 
 
 
90c25b8
c4f19ce
 
 
 
 
 
90c25b8
c4f19ce
 
 
 
 
 
90c25b8
c4f19ce
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: apache-2.0
base_model:
- Qwen/Qwen1.5-7B-Chat
- deepseek-ai/deepseek-coder-6.7b-instruct
tags:
- merge
- mergekit
- qwen
- deepseek
- coder
- slerp
---
# Qwen15-DeepSeek-Coder-Merge
This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.

## About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)

## Merge Details
### Merge Method
This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

- **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
- **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer
- **Format**: bfloat16 precision for efficient memory usage

### Models Merged
* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities

### Configuration
```yaml
slices:
  - sources:
      - model: Qwen/Qwen1.5-7B-Chat
        layer_range: [0, 32]
      - model: deepseek-ai/deepseek-coder-6.7b-instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: Qwen/Qwen1.5-7B-Chat
parameters:
  t: 0.6
dtype: bfloat16
```

## Model Capabilities
This merge combines:
- Qwen 1.5's strong instruction following and general knowledge capabilities
- DeepSeek Coder's specialized programming expertise and code generation abilities
- Enhanced technical understanding and explanation capabilities
- Fully open architecture with no usage restrictions

The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
- Code generation across multiple programming languages
- Technical documentation and explanations
- Algorithm implementation and problem-solving
- Software development assistance with natural language understanding
- Debugging and code optimization suggestions

## Limitations
- Inherits limitations from both base models
- May exhibit inconsistent behavior for certain advanced programming tasks
- No additional alignment or fine-tuning beyond the base models' training
- Model was created through parameter merging without additional training data
- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts

## License
This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.