Update README.md

ed482f9 verified over 1 year ago

1.5 kB

metadata

license: apache-2.0

This is a llamafile for Mixtral-8x7B-Instruct-v0.1.

These were converted and quantized from source safetensors using llama.cpp on April 3, 2024. This matters because there are several GGUF files on HF which were created before llama.cpp's support for MoE quantization was fully debugged, even though it looked like it was producing working files at the time.

I'll be uploading the quantized .gguf sources I created as well if anyone wants them as a reference or for further work.

It's over 4gb so if you want to use it on Windows you'll have to run it from WSL.

WSL note: If you get the error about APE, and the recommended command

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop'

doesn't work, the file might be named something else so I had success with

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop-late'

If that fails too, just navigate to /proc/sys/fs/binfmt_msc and see what files look like WSLInterop and echo a -1 to whatever it's called by changing that part of the recommended command.

Llamafiles are a standalone executable that run an LLM server locally on a variety of operating systems including FreeBSD, Windows, Windows via WSL, Linux, and Mac. The same file works everywhere. You just download, run it, open the chat interface in a browser, and interact. Options can be passed in to expose the api etc. See their docs for details.