Davidsv commited on
Commit
c4f19ce
·
verified ·
1 Parent(s): 90c25b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -29
README.md CHANGED
@@ -1,23 +1,37 @@
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen1.5-7B-Chat
4
  - deepseek-ai/deepseek-coder-6.7b-instruct
5
  tags:
6
  - merge
7
  - mergekit
8
- - lazymergekit
9
- - Qwen/Qwen1.5-7B-Chat
10
- - deepseek-ai/deepseek-coder-6.7b-instruct
 
11
  ---
 
 
 
 
 
12
 
13
- # Qwen15-DeepSeekCoder-Merge
14
 
15
- Qwen15-DeepSeekCoder-Merge is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
16
- * [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)
17
- * [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
18
 
19
- ## 🧩 Configuration
 
 
20
 
 
 
 
 
 
21
  ```yaml
22
  slices:
23
  - sources:
@@ -32,27 +46,26 @@ parameters:
32
  dtype: bfloat16
33
  ```
34
 
35
- ## 💻 Usage
36
-
37
- ```python
38
- !pip install -qU transformers accelerate
39
-
40
- from transformers import AutoTokenizer
41
- import transformers
42
- import torch
43
 
44
- model = "Davidsv/Qwen15-DeepSeekCoder-Merge"
45
- messages = [{"role": "user", "content": "What is a large language model?"}]
 
 
 
 
46
 
47
- tokenizer = AutoTokenizer.from_pretrained(model)
48
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
49
- pipeline = transformers.pipeline(
50
- "text-generation",
51
- model=model,
52
- torch_dtype=torch.float16,
53
- device_map="auto",
54
- )
55
 
56
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
57
- print(outputs[0]["generated_text"])
58
- ```
 
1
  ---
2
+ license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen1.5-7B-Chat
5
  - deepseek-ai/deepseek-coder-6.7b-instruct
6
  tags:
7
  - merge
8
  - mergekit
9
+ - qwen
10
+ - deepseek
11
+ - coder
12
+ - slerp
13
  ---
14
+ # Qwen15-DeepSeek-Coder-Merge
15
+ This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.
16
+
17
+ ## About Me
18
+ I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
19
 
20
+ 🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)
21
 
22
+ ## Merge Details
23
+ ### Merge Method
24
+ This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
25
 
26
+ - **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
27
+ - **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer
28
+ - **Format**: bfloat16 precision for efficient memory usage
29
 
30
+ ### Models Merged
31
+ * [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
32
+ * [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities
33
+
34
+ ### Configuration
35
  ```yaml
36
  slices:
37
  - sources:
 
46
  dtype: bfloat16
47
  ```
48
 
49
+ ## Model Capabilities
50
+ This merge combines:
51
+ - Qwen 1.5's strong instruction following and general knowledge capabilities
52
+ - DeepSeek Coder's specialized programming expertise and code generation abilities
53
+ - Enhanced technical understanding and explanation capabilities
54
+ - Fully open architecture with no usage restrictions
 
 
55
 
56
+ The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
57
+ - Code generation across multiple programming languages
58
+ - Technical documentation and explanations
59
+ - Algorithm implementation and problem-solving
60
+ - Software development assistance with natural language understanding
61
+ - Debugging and code optimization suggestions
62
 
63
+ ## Limitations
64
+ - Inherits limitations from both base models
65
+ - May exhibit inconsistent behavior for certain advanced programming tasks
66
+ - No additional alignment or fine-tuning beyond the base models' training
67
+ - Model was created through parameter merging without additional training data
68
+ - Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts
 
 
69
 
70
+ ## License
71
+ This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.