metadata

license: apache-2.0
base_model:
  - Qwen/Qwen1.5-7B-Chat
  - deepseek-ai/deepseek-coder-6.7b-instruct
tags:
  - merge
  - mergekit
  - qwen
  - deepseek
  - coder
  - slerp

Qwen15-DeepSeek-Coder-Merge

This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

🔗 Connect with me on LinkedIn

Merge Details

Merge Method

This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

Weighted Blend: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
Complete Layer Merging: Full layer-range coverage ensures comprehensive knowledge transfer
Format: bfloat16 precision for efficient memory usage

Models Merged

Qwen/Qwen1.5-7B-Chat - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
deepseek-ai/deepseek-coder-6.7b-instruct - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities

Configuration

slices:
  - sources:
      - model: Qwen/Qwen1.5-7B-Chat
        layer_range: [0, 32]
      - model: deepseek-ai/deepseek-coder-6.7b-instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: Qwen/Qwen1.5-7B-Chat
parameters:
  t: 0.6
dtype: bfloat16

Model Capabilities

This merge combines:

Qwen 1.5's strong instruction following and general knowledge capabilities
DeepSeek Coder's specialized programming expertise and code generation abilities
Enhanced technical understanding and explanation capabilities
Fully open architecture with no usage restrictions

The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:

Code generation across multiple programming languages
Technical documentation and explanations
Algorithm implementation and problem-solving
Software development assistance with natural language understanding
Debugging and code optimization suggestions

Limitations

Inherits limitations from both base models
May exhibit inconsistent behavior for certain advanced programming tasks
No additional alignment or fine-tuning beyond the base models' training
Model was created through parameter merging without additional training data
Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts

License

This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.