ZeroXClem/Qwen2.5-7B-DistilPrism
Qwen2.5-7B-DistilPrism is a distillation / reasoning focused model merge designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a refined, high-performance language model. Utilizing the Model Stock merge method, this fusion captures the best attributes of DeepSeek-R1-Distill-Qwen-7B and its improved derivatives.
🚀 Merged Models
This model is a weighted merge of the following:
- huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2: An uncensored distillation of DeepSeek-R1, optimized to remove refusals and improve usability.
- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1: A refined distillation that improves accuracy and robustness across various benchmarks.
- Triangle104/DSR1-Distill-Qwen-7B-RP: A composite merge of various distilled DeepSeek variants, serving as an essential ingredient for performance tuning.
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B: The foundation of this merge, representing the distilled form of DeepSeek-R1 optimized for efficiency and strong reasoning capabilities.
🧩 Merge Configuration
The following YAML configuration defines how these models were combined using Model Stock, ensuring balanced contributions from each source:
# Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
name: ZeroXClem-Qwen2.5-7B-DistilPrism
merge_method: model_stock
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tokenizer_source: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
dtype: bfloat16
parameters:
normalize: true
rescale: true
models:
- model: huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
parameters:
weight: 0.3
- model: mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
parameters:
weight: 0.25
- model: Triangle104/DSR1-Distill-Qwen-7B-RP
parameters:
weight: 0.2
- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
parameters:
weight: 0.25
🔑 Key Parameters
- Normalization & Rescaling: Ensures weight distributions remain balanced across all components.
- Model Stock Merge Method: Optimizes contribution from each model to retain the best attributes.
- Weighted Blending: The abliterated and re-distilled models contribute the most, refining both alignment and general usability.
🗣️ Inference
You can use the model for text generation as follows:
Ollama
Quickstart to Ollama Guide Here I recommend ollama for daily driver applications, as it supports thinkking tags.
ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism
# If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant.
Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
# Define the model name
model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load the model
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Initialize the pipeline
text_generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Define the input prompt
prompt = "Explain the significance of artificial intelligence in modern healthcare."
# Generate the output
outputs = text_generator(
prompt,
max_new_tokens=150,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
# Print the generated text
print(outputs[0]["generated_text"])
🎯 Use Case & Applications
Qwen2.5-7B-DistilPrism is designed for efficient, high-quality text generation with strong reasoning capabilities. It is well-suited for:
- Advanced Reasoning & Problem Solving: Excels in logic-heavy tasks and multi-step reasoning problems.
- Conversational AI: Optimized for fluid, responsive dialogue, reducing refusals and improving engagement.
- Mathematical & Scientific Computation: Enhanced math & code generation abilities compared to standard distillations.
- Content Creation & Summarization: Generates coherent and contextually rich text suitable for various applications.
📜 License
This model is released under the MIT License.
📊 Benchmark Results (Coming Soon)
We are currently in the process of quantizing and benchmarking this model. Stay tuned for performance updates across:
- IFEval (0-Shot)
- BBH (3-Shot)
- MATH (4-Shot)
- GPQA (0-Shot)
- MuSR (0-Shot)
- MMLU-PRO (5-Shot)
💡 Tags
merge
mergekit
model_stock
DeepSeek-R1
Distillation
abliterated
re-distilled
DeepSeek-R1-Distill-Qwen-7B
🙏 Special Thanks
This project wouldn't be possible without the incredible contributions from:
- @huihui-ai – For developing DeepSeek-R1-Distill-Qwen-7B-abliterated-v2, a bold step towards improving model alignment.
- @mobiuslabsgmbh – For refining distillation techniques with DeepSeek-R1-ReDistill-Qwen-7B-v1.1.
- @Triangle104 – For crafting innovative merges like DSR1-Distill-Qwen-7B-RP, an essential component in this blend.
- @deepseek-ai – For open-sourcing DeepSeek-R1-Distill-Qwen-7B, a foundation for reasoning advancements.
And a heartfelt thank you to everyone in the 🤗 & Open-Source AI community for their continued research, testing, and support. 💜🚀
🔗 Additional Resources
- Downloads last month
- 8