Davidsv's picture
Update README.md
c4f19ce verified
---
license: apache-2.0
base_model:
- Qwen/Qwen1.5-7B-Chat
- deepseek-ai/deepseek-coder-6.7b-instruct
tags:
- merge
- mergekit
- qwen
- deepseek
- coder
- slerp
---
# Qwen15-DeepSeek-Coder-Merge
This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.
## About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
๐Ÿ”— [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)
## Merge Details
### Merge Method
This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
- **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
- **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer
- **Format**: bfloat16 precision for efficient memory usage
### Models Merged
* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities
### Configuration
```yaml
slices:
- sources:
- model: Qwen/Qwen1.5-7B-Chat
layer_range: [0, 32]
- model: deepseek-ai/deepseek-coder-6.7b-instruct
layer_range: [0, 32]
merge_method: slerp
base_model: Qwen/Qwen1.5-7B-Chat
parameters:
t: 0.6
dtype: bfloat16
```
## Model Capabilities
This merge combines:
- Qwen 1.5's strong instruction following and general knowledge capabilities
- DeepSeek Coder's specialized programming expertise and code generation abilities
- Enhanced technical understanding and explanation capabilities
- Fully open architecture with no usage restrictions
The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
- Code generation across multiple programming languages
- Technical documentation and explanations
- Algorithm implementation and problem-solving
- Software development assistance with natural language understanding
- Debugging and code optimization suggestions
## Limitations
- Inherits limitations from both base models
- May exhibit inconsistent behavior for certain advanced programming tasks
- No additional alignment or fine-tuning beyond the base models' training
- Model was created through parameter merging without additional training data
- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts
## License
This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.