gobean's picture
Update README.md
477ec28 verified
|
raw
history blame
1.43 kB
metadata
license: apache-2.0

This is a llamafile for Mixtral-8x7B-Instruct-v0.1.

I'm adding both q4-k-m and q5-k-m this time since it's a big model. On my 4090, q4-k-m is twice as face a q5-k-m without noticeable difference in chat or information quality. The speed of q5-k-m on my desktop computer is unusable, q4-k-m recommended.

The quantized gguf was downloaded straight from TheBloke this time, and then zipped into a llamafile using Mozilla's awesome project.

It's over 4gb so if you want to use it on Windows you'll have to run it from WSL.

WSL note: If you get the error about APE, and the recommended command

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop'

doesn't work, the file might be named something else so I had success with

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop-late'

If that fails too, just navigate to /proc/sys/fs/binfmt_msc and see what files look like WSLInterop and echo a -1 to whatever it's called by changing that part of the recommended command.

Llamafiles are a standalone executable that run an LLM server locally on a variety of operating systems. You just run it, open the chat interface in a browser, and interact. Options can be passed in to expose the api etc.