chatbot / docs /README_CPU.md
kelvin-t-lu's picture
init
dbd2ac6
|
raw
history blame
2.92 kB

CPU Details

Details that do not depend upon whether running on CPU for Linux, Windows, or MAC.

LLaMa.cpp

Default llama.cpp model is LLaMa2 GPTQ model from TheBloke:

  • Run LLaMa.cpp LLaMa2 model:

    With documents in user_path folder, run:

    # if don't have wget, download to repo folder using below link
    wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin
    python generate.py --base_model='llama' --prompt_type=llama2 --score_model=None --langchain_mode='UserData' --user_path=user_path
    

For another llama.cpp model:

  • Download from TheBloke. For example, 13B WizardLM Quantized or 7B WizardLM Quantized. TheBloke has a variety of model types, quantization bit depths, and memory consumption. Choose what is best for your system's specs. For 7B case, download WizardLM-7B-uncensored.ggmlv3.q8_0.bin into local path:
    wget https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q8_0.bin
    
  • With documents in user_path folder, run:
     python generate.py --base_model=llama --model_path_llama=WizardLM-7B-uncensored.ggmlv3.q8_0.bin --score_model=None --langchain_mode='UserData' --user_path=user_path
    
    For llama.cpp based models on CPU, for computers with low system RAM or slow CPUs, we recommend running:
     python generate.py --base_model=llama --model_path_llama=WizardLM-7B-uncensored.ggmlv3.q8_0.bin --llamacpp_dict="{'use_mlock':False,'n_batch':256}" --max_seq_len=512 --score_model=None --langchain_mode='UserData' --user_path=user_path
    

GPT4ALL

  • Choose Model from GPT4All Model explorer GPT4All-J compatible model. One does not need to download manually, the GPT4ALL package will download at runtime and put it into .cache like Hugging Face would.

  • With documents in user_path folder, run:

     python generate.py --base_model=gptj --model_path_gptj=ggml-gpt4all-j-v1.3-groovy.bin --score_model=None --langchain_mode='UserData' --user_path=user_path
    

or

 python generate.py --base_model=gpt4all_llama --model_name_gpt4all_llama=ggml-wizardLM-7B.q4_2.bin --score_model=None --langchain_mode='UserData' --user_path=user_path

However, gpjt model often gives no output, even outside h2oGPT. See GPT4All for details on installation instructions if any issues encountered.

Low-memory

See Low Memory for more information about low-memory recommendations.