Davidsv's picture
Update README.md
c4f19ce verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen1.5-7B-Chat
  - deepseek-ai/deepseek-coder-6.7b-instruct
tags:
  - merge
  - mergekit
  - qwen
  - deepseek
  - coder
  - slerp

Qwen15-DeepSeek-Coder-Merge

This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

🔗 Connect with me on LinkedIn

Merge Details

Merge Method

This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

  • Weighted Blend: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
  • Complete Layer Merging: Full layer-range coverage ensures comprehensive knowledge transfer
  • Format: bfloat16 precision for efficient memory usage

Models Merged

Configuration

slices:
  - sources:
      - model: Qwen/Qwen1.5-7B-Chat
        layer_range: [0, 32]
      - model: deepseek-ai/deepseek-coder-6.7b-instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: Qwen/Qwen1.5-7B-Chat
parameters:
  t: 0.6
dtype: bfloat16

Model Capabilities

This merge combines:

  • Qwen 1.5's strong instruction following and general knowledge capabilities
  • DeepSeek Coder's specialized programming expertise and code generation abilities
  • Enhanced technical understanding and explanation capabilities
  • Fully open architecture with no usage restrictions

The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:

  • Code generation across multiple programming languages
  • Technical documentation and explanations
  • Algorithm implementation and problem-solving
  • Software development assistance with natural language understanding
  • Debugging and code optimization suggestions

Limitations

  • Inherits limitations from both base models
  • May exhibit inconsistent behavior for certain advanced programming tasks
  • No additional alignment or fine-tuning beyond the base models' training
  • Model was created through parameter merging without additional training data
  • Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts

License

This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.