when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?
I got some error messeages when starting ollama with OLLAMA_FLASH_ATTENTION=true and OLLAMA_KV_CACHE_TYPE=q8_0
[GIN] 2025/02/11 - 17:12:44 | 200 | 38.958µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/02/11 - 17:12:44 | 200 | 14.20775ms | 127.0.0.1 | POST "/api/show"
time=2025-02-11T17:12:44.051+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/bigmodel/.ollama/models/blobs/sha256-8d0774696673bc32468922d072a7658fd4883ec77f5035329f0626dad6df0340 gpu=0 parallel=1 available=154618822656 required="141.0 GiB"
time=2025-02-11T17:12:44.051+08:00 level=INFO source=server.go:104 msg="system memory" total="192.0 GiB" free="182.3 GiB" free_swap="0 B"
time=2025-02-11T17:12:44.052+08:00 level=INFO source=memory.go:356 msg="offload to metal" layers.requested=36 layers.model=62 layers.offload=36 layers.split="" memory.available="[144.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="238.6 GiB" memory.required.partial="141.0 GiB" memory.required.kv="76.2 GiB" memory.required.allocations="[141.0 GiB]" memory.weights.total="232.4 GiB" memory.weights.repeating="231.7 GiB" memory.weights.nonrepeating="725.0 MiB" memory.graph.full="4.2 GiB" memory.graph.partial="4.2 GiB"
time=2025-02-11T17:12:44.052+08:00 level=WARN source=server.go:216 msg="flash attention enabled but not supported by model"
time=2025-02-11T17:12:44.052+08:00 level=WARN source=server.go:234 msg="quantized kv cache requested but flash attention disabled" type=q4_0
time=2025-02-11T17:12:44.053+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /Users/bigmodel/.ollama/models/blobs/sha256-8d0774696673bc32468922d072a7658fd4883ec77f5035329f0626dad6df0340 --ctx-size 16384 --batch-size 512 --n-gpu-layers 36 --threads 16 --no-mmap --parallel 1 --port 64978"