DeepSeek-R1-Distill-SRE-Qwen-32B-INT8
Model Introduction
DeepSeek-R1-Distill-SRE-Qwen-32B-INT8
is the industry's first publicly available operations large model. It is a specialized mixed-precision 8-bit quantized large language model fine-tuned from the DeepSeek-R1-Distill-Qwen-32B
model, optimized specifically for operations and Site Reliability Engineering (SRE) scenarios. This model inherits the powerful reasoning capabilities of the DeepSeek-R1 series and has been further fine-tuned using the ahmedgongi/Devops_LLM dataset, significantly enhancing its utility in the following tasks:
- Automated script generation
- System monitoring and analysis
- Troubleshooting and root cause identification
This model is suitable for enterprise-level system management, cloud-native operations platform development, and other scenarios, providing an efficient solution that balances performance and cost for intelligent operations. The current version uses 8-bit quantization (INT8), implemented with mixed-precision optimization via bitsandbytes
. Linear layer weights are stored as torch.int8
, while other components (e.g., Embeddings and LayerNorm) remain in torch.float16
.
We welcome community users to test the model and share their experiences, helping us improve the model documentation and application scenarios together!
Model Files and Weights
Model Files:
The model weights are stored in standard formats supported by Hugging Face (e.g.,.safetensors
or.bin
) and are located in the root directory of this repository.
Example file structure:├── config.json ├── model.safetensors ├── tokenizer.json └── ...
Quantization Details:
The model uses 8-bit quantization (INT8), with linear layer weights intorch.int8
and non-quantized parts (e.g., Embeddings, LayerNorm) intorch.float16
, optimized for mixed precision usingbitsandbytes
.
How to Use the Model for Inference
This model supports efficient inference and has been verified to be compatible with the vLLM
and SGLang
frameworks. Below is an example using SGLang (recommended).
1. Inference with SGLang
SGLang
is a high-performance serving framework suitable for fast inference in complex operations tasks.
Environment Setup
pip install sglang
Start the SGLang Server
python -m sglang.launch_server --model-path [your-username]/DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 --quant bitsandbytes --port 30000
Python Inference Example
import openai
client = openai.Client(
base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
# Chat completion
response = client.chat.completions.create(
model="default",
messages=[
{"role": "system", "content": "You are a senior operations expert."},
{"role": "user", "content": "Analyze the following log and identify possible failure causes: '2023-10-10 12:00:00 ERROR: Disk I/O timeout'."},
],
temperature=0,
max_tokens=2048,
)
print(response.choices[0].message.content)
Model Details
- Base Model:
DeepSeek-R1-Distill-Qwen-32B
- Fine-Tuning Dataset: ahmedgongi/Devops_LLM
- Quantization: 8-bit INT8 (linear layer weights), FP16 (Embeddings, LayerNorm, etc.)
- Compatible Frameworks:
bitsandbytes
,vLLM
,SGLang
- Recommended Hardware: NVIDIA GPU (CUDA support), recommended 48GB*2+ VRAM to load the full model
Use Cases
- Automated Operations: Script generation, configuration management.
- System Monitoring: Metric analysis, alert rule generation.
- Troubleshooting: Log parsing, root cause analysis.
The model excels in SRE and DevOps scenarios, particularly for enterprise applications requiring fast response times and resource optimization.
Disclaimer
Due to the nature of language models, the generated content may contain hallucinations or biased statements. Please use the model’s outputs with caution.
If you plan to use this model publicly or commercially, note that the service provider is responsible for any adverse effects or harmful statements resulting from its use. The developers of this project are not liable for any damages or losses caused by the use of this project (including but not limited to data, models, code, etc.).
Community Contributions
Due to the limited information in the current documentation, we encourage community participation:
- Raise questions, use cases, or improvement suggestions in the 【Community】 section on Hugging Face.
- Submit Pull Requests to enhance model details, optimize inference code, or share operations-related prompt examples.
Thank you for your use and support! If you have any questions, feel free to contact us. Email: [email protected]
- Downloads last month
- 0
Model tree for Phpcool/DeepSeek-R1-Distill-SRE-Qwen-32B-INT8
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B