metadata
license: apache-2.0
language:
- ru
base_model:
- RefalMachine/RuadaptQwen2.5-32B-Pro-Beta
datasets:
- pomelk1n/RuadaptQwen-Quantization-Dataset
pipeline_tag: text-generation
tags:
- AWQ
- GEMM
Эта модель является квантизированной версией.
pip install autoawq==0.2.8
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch
# Specify paths and hyperparameters for quantization
model_path = "/data/models/RuadaptQwen2.5-32B-Pro-Beta"
quant_path = "/data/models/RuadaptQwen2.5-32B-Pro-Beta-AWQ"
quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}
# Load your tokenizer and model with AutoAWQ
model = AutoAWQForCausalLM.from_pretrained(
model_path, safetensors=True, torch_dtype=torch.bfloat16
)
model.quantize(tokenizer, quant_config=quant_config, calib_data="/data/scripts/RuadaptQwen-Quantization-Dataset", text_column='text')
model.save_quantized(quant_path, safetensors=True, shard_size="5GB")