|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen1.5-7B-Chat |
|
- deepseek-ai/deepseek-coder-6.7b-instruct |
|
tags: |
|
- merge |
|
- mergekit |
|
- qwen |
|
- deepseek |
|
- coder |
|
- slerp |
|
--- |
|
# Qwen15-DeepSeek-Coder-Merge |
|
This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion. |
|
|
|
## About Me |
|
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities. |
|
|
|
๐ [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/) |
|
|
|
## Merge Details |
|
### Merge Method |
|
This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance: |
|
|
|
- **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model |
|
- **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer |
|
- **Format**: bfloat16 precision for efficient memory usage |
|
|
|
### Models Merged |
|
* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following |
|
* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities |
|
|
|
### Configuration |
|
```yaml |
|
slices: |
|
- sources: |
|
- model: Qwen/Qwen1.5-7B-Chat |
|
layer_range: [0, 32] |
|
- model: deepseek-ai/deepseek-coder-6.7b-instruct |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: Qwen/Qwen1.5-7B-Chat |
|
parameters: |
|
t: 0.6 |
|
dtype: bfloat16 |
|
``` |
|
|
|
## Model Capabilities |
|
This merge combines: |
|
- Qwen 1.5's strong instruction following and general knowledge capabilities |
|
- DeepSeek Coder's specialized programming expertise and code generation abilities |
|
- Enhanced technical understanding and explanation capabilities |
|
- Fully open architecture with no usage restrictions |
|
|
|
The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as: |
|
- Code generation across multiple programming languages |
|
- Technical documentation and explanations |
|
- Algorithm implementation and problem-solving |
|
- Software development assistance with natural language understanding |
|
- Debugging and code optimization suggestions |
|
|
|
## Limitations |
|
- Inherits limitations from both base models |
|
- May exhibit inconsistent behavior for certain advanced programming tasks |
|
- No additional alignment or fine-tuning beyond the base models' training |
|
- Model was created through parameter merging without additional training data |
|
- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts |
|
|
|
## License |
|
This model is released under the Apache 2.0 license, consistent with the underlying models' licenses. |