DeepSeek-R1-Distill-SRE-Qwen-32B-INT8

Model Introduction

DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 is the industry's first publicly available operations large model. It is a specialized mixed-precision 8-bit quantized large language model fine-tuned from the DeepSeek-R1-Distill-Qwen-32B model, optimized specifically for operations and Site Reliability Engineering (SRE) scenarios. This model inherits the powerful reasoning capabilities of the DeepSeek-R1 series and has been further fine-tuned using the ahmedgongi/Devops_LLM dataset, significantly enhancing its utility in the following tasks:

Automated script generation
System monitoring and analysis
Troubleshooting and root cause identification

This model is suitable for enterprise-level system management, cloud-native operations platform development, and other scenarios, providing an efficient solution that balances performance and cost for intelligent operations. The current version uses 8-bit quantization (INT8), implemented with mixed-precision optimization via bitsandbytes. Linear layer weights are stored as torch.int8, while other components (e.g., Embeddings and LayerNorm) remain in torch.float16.

We welcome community users to test the model and share their experiences, helping us improve the model documentation and application scenarios together!

Model Files and Weights

Model Files:
The model weights are stored in standard formats supported by Hugging Face (e.g., .safetensors or .bin) and are located in the root directory of this repository.
Example file structure:
```
├── config.json
├── model.safetensors
├── tokenizer.json
└── ...
```
Quantization Details:
The model uses 8-bit quantization (INT8), with linear layer weights in torch.int8 and non-quantized parts (e.g., Embeddings, LayerNorm) in torch.float16, optimized for mixed precision using bitsandbytes.

How to Use the Model for Inference

This model supports efficient inference and has been verified to be compatible with the vLLM and SGLang frameworks. Below is an example using SGLang (recommended).

1. Inference with SGLang

SGLang is a high-performance serving framework suitable for fast inference in complex operations tasks.

Environment Setup

pip install sglang

Start the SGLang Server

python -m sglang.launch_server --model-path [your-username]/DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 --quant bitsandbytes --port 30000

Python Inference Example

import openai
client = openai.Client(
    base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
# Chat completion
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "system", "content": "You are a senior operations expert."},
        {"role": "user", "content": "Analyze the following log and identify possible failure causes: '2023-10-10 12:00:00 ERROR: Disk I/O timeout'."},
    ],
    temperature=0,
    max_tokens=2048,
)
print(response.choices[0].message.content)

Model Details

Base Model: DeepSeek-R1-Distill-Qwen-32B
Fine-Tuning Dataset: ahmedgongi/Devops_LLM
Quantization: 8-bit INT8 (linear layer weights), FP16 (Embeddings, LayerNorm, etc.)
Compatible Frameworks: bitsandbytes, vLLM, SGLang
Recommended Hardware: NVIDIA GPU (CUDA support), recommended 48GB*2+ VRAM to load the full model

Use Cases

Automated Operations: Script generation, configuration management.
System Monitoring: Metric analysis, alert rule generation.
Troubleshooting: Log parsing, root cause analysis.

The model excels in SRE and DevOps scenarios, particularly for enterprise applications requiring fast response times and resource optimization.

Disclaimer

Due to the nature of language models, the generated content may contain hallucinations or biased statements. Please use the model’s outputs with caution.
If you plan to use this model publicly or commercially, note that the service provider is responsible for any adverse effects or harmful statements resulting from its use. The developers of this project are not liable for any damages or losses caused by the use of this project (including but not limited to data, models, code, etc.).

Community Contributions

Due to the limited information in the current documentation, we encourage community participation:

Raise questions, use cases, or improvement suggestions in the 【Community】 section on Hugging Face.
Submit Pull Requests to enhance model details, optimize inference code, or share operations-related prompt examples.

Thank you for your use and support! If you have any questions, feel free to contact us. Email: [email protected]

Phpcool
/

DeepSeek-R1-Distill-SRE-Qwen-32B-INT8