ecastera
/

Qwen2.5-Veltha-14B-0.5-Q3-gguf

Inference Endpoints

Model card Files Files and versions Community

ecastera commited on Jan 2

Commit

9872d58

·

verified ·

1 Parent(s): f9dc94b

Update README.md

Files changed (1) hide show

README.md +26 -3

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+* Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint
+* One of the best models I tried in Spanish.
+* Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5
+* Models Merged
+*
+`
+* huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
+* allura-org/TQ2.5-14B-Aletheia-v1
+* EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
+* v000000/Qwen2.5-Lumen-14B
+`
+* All quants made using imatrix option with dataset from here
+* Using llama.cpp compiled with CUDA support for quantization and inference:
+`
+ggml_cuda_init: found 2 CUDA devices:
+  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
+  Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
+version: 3982 (cc2983d3)
+built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
+`