ZeroXClem/Qwen2.5-7B-DistilPrism

Qwen2.5-7B-DistilPrism is a distillation / reasoning focused model merge designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a refined, high-performance language model. Utilizing the Model Stock merge method, this fusion captures the best attributes of DeepSeek-R1-Distill-Qwen-7B and its improved derivatives.

🚀 Merged Models

This model is a weighted merge of the following:

🧩 Merge Configuration

The following YAML configuration defines how these models were combined using Model Stock, ensuring balanced contributions from each source:

# Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
name: ZeroXClem-Qwen2.5-7B-DistilPrism
merge_method: model_stock
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tokenizer_source: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
dtype: bfloat16
parameters:
  normalize: true
  rescale: true
models:
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
    parameters:
      weight: 0.3
  - model: mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
    parameters:
      weight: 0.25
  - model: Triangle104/DSR1-Distill-Qwen-7B-RP
    parameters:
      weight: 0.2
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
    parameters:
      weight: 0.25

🔑 Key Parameters

  • Normalization & Rescaling: Ensures weight distributions remain balanced across all components.
  • Model Stock Merge Method: Optimizes contribution from each model to retain the best attributes.
  • Weighted Blending: The abliterated and re-distilled models contribute the most, refining both alignment and general usability.

🗣️ Inference

You can use the model for text generation as follows:

Ollama

Quickstart to Ollama Guide Here I recommend ollama for daily driver applications, as it supports thinkking tags.

ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism

# If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant. 

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Define the model name
model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Initialize the pipeline
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Define the input prompt
prompt = "Explain the significance of artificial intelligence in modern healthcare."

# Generate the output
outputs = text_generator(
    prompt,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Print the generated text
print(outputs[0]["generated_text"])

🎯 Use Case & Applications

Qwen2.5-7B-DistilPrism is designed for efficient, high-quality text generation with strong reasoning capabilities. It is well-suited for:

  • Advanced Reasoning & Problem Solving: Excels in logic-heavy tasks and multi-step reasoning problems.
  • Conversational AI: Optimized for fluid, responsive dialogue, reducing refusals and improving engagement.
  • Mathematical & Scientific Computation: Enhanced math & code generation abilities compared to standard distillations.
  • Content Creation & Summarization: Generates coherent and contextually rich text suitable for various applications.

📜 License

This model is released under the MIT License.


📊 Benchmark Results (Coming Soon)

We are currently in the process of quantizing and benchmarking this model. Stay tuned for performance updates across:

  • IFEval (0-Shot)
  • BBH (3-Shot)
  • MATH (4-Shot)
  • GPQA (0-Shot)
  • MuSR (0-Shot)
  • MMLU-PRO (5-Shot)

💡 Tags

  • merge
  • mergekit
  • model_stock
  • DeepSeek-R1
  • Distillation
  • abliterated
  • re-distilled
  • DeepSeek-R1-Distill-Qwen-7B

🙏 Special Thanks

This project wouldn't be possible without the incredible contributions from:

  • @huihui-ai – For developing DeepSeek-R1-Distill-Qwen-7B-abliterated-v2, a bold step towards improving model alignment.
  • @mobiuslabsgmbh – For refining distillation techniques with DeepSeek-R1-ReDistill-Qwen-7B-v1.1.
  • @Triangle104 – For crafting innovative merges like DSR1-Distill-Qwen-7B-RP, an essential component in this blend.
  • @deepseek-ai – For open-sourcing DeepSeek-R1-Distill-Qwen-7B, a foundation for reasoning advancements.

And a heartfelt thank you to everyone in the 🤗 & Open-Source AI community for their continued research, testing, and support. 💜🚀


🔗 Additional Resources

Downloads last month
8
Safetensors
Model size
7.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ZeroXClem/Qwen2.5-7B-DistilPrism