This is a quantization of the Qwen2.5-14B-Instruct-1M.

Qwen2.5-14B-Instruct-1M, developed by Alibaba Cloud, is a standout model in the world of large language models due to its exceptional capability to handle ultra-long contexts, supporting up to 1 million tokens. This feature makes it significantly more effective for long-context tasks compared to previous versions, while still maintaining strong performance on shorter tasks. With an architecture incorporating advanced techniques like RoPE, SwiGLU, and RMSNorm, Qwen2.5-1M offers a balanced blend of sophistication and efficiency. It is designed as a causal language model and demonstrates considerable prowess in generating coherent and contextually aware text, marking a substantial advancement in handling complex language generation tasks.

Evaluations

This model provides an accuracy recovery of 100.09%.

English Qwen2.5-14B-Instruct-1M Qwen2.5-14B-Instruct-1M-FP8-Dynamic (this)
Avg. 74.79 74.86
ARC 70 70.3
Hellaswag 74.6 74.5
MMLU 79.77 79.77

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Qwen2.5-14B-Instruct-1M-FP8-Dynamic --max-model-len 42000 --gpu-memory-utilization 0.95

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Qwen2.5-14B-Instruct-1M-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '

⚡ This model is optimized to handle heavy workloads providing a total throughput of ️4497 tokens per second using one NVIDIA L40S ⚡

Downloads last month
375
Safetensors
Model size
14.8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for cortecs/Qwen2.5-14B-Instruct-1M-FP8-Dynamic

Base model

Qwen/Qwen2.5-14B
Quantized
(50)
this model