rajabmondal commited on
Commit
8781adf
·
verified ·
1 Parent(s): 58a9fbb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -122
README.md CHANGED
@@ -75,127 +75,5 @@ Refer to the Provided Files table below to see what files use which methods, and
75
 
76
 
77
 
78
- <!-- README_GGUF.md-provided-files end -->
79
-
80
- <!-- README_GGUF.md-how-to-download start -->
81
- ## How to download GGUF files
82
-
83
- **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
84
-
85
- The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
86
- - LM Studio
87
- - LoLLMS Web UI
88
- - Faraday.dev
89
-
90
- ### In `text-generation-webui`
91
-
92
- Under Download Model, you can enter the model repo: TheBloke/CodeLlama-7B-GGUF and below it, a specific filename to download, such as: codellama-7b.q4_K_M.gguf.
93
-
94
- Then click Download.
95
-
96
- ### On the command line, including multiple files at once
97
-
98
- I recommend using the `huggingface-hub` Python library:
99
-
100
- ```shell
101
- pip3 install huggingface-hub>=0.17.1
102
- ```
103
-
104
- Then you can download any individual model file to the current directory, at high speed, with a command like this:
105
-
106
- ```shell
107
- huggingface-cli download TheBloke/CodeLlama-7B-GGUF codellama-7b.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
108
- ```
109
-
110
- <details>
111
- <summary>More advanced huggingface-cli download usage</summary>
112
-
113
- You can also download multiple files at once with a pattern:
114
-
115
- ```shell
116
- huggingface-cli download TheBloke/CodeLlama-7B-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
117
- ```
118
-
119
- For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
120
-
121
- To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
122
-
123
- ```shell
124
- pip3 install hf_transfer
125
- ```
126
-
127
- And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
128
-
129
- ```shell
130
- HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/CodeLlama-7B-GGUF codellama-7b.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
131
- ```
132
-
133
- Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
134
- </details>
135
- <!-- README_GGUF.md-how-to-download end -->
136
-
137
- <!-- README_GGUF.md-how-to-run start -->
138
- ## Example `llama.cpp` command
139
-
140
- Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
141
-
142
- ```shell
143
- ./main -ngl 32 -m codellama-7b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
144
- ```
145
-
146
- Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
147
-
148
- Change `-c 4096` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
149
-
150
- If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
151
-
152
- For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
153
-
154
- ## How to run in `text-generation-webui`
155
-
156
- Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
157
-
158
- ## How to run from Python code
159
-
160
- You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
161
-
162
- ### How to load this model from Python using ctransformers
163
-
164
- #### First install the package
165
-
166
- ```bash
167
- # Base ctransformers with no GPU acceleration
168
- pip install ctransformers>=0.2.24
169
- # Or with CUDA GPU acceleration
170
- pip install ctransformers[cuda]>=0.2.24
171
- # Or with ROCm GPU acceleration
172
- CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
173
- # Or with Metal GPU acceleration for macOS systems
174
- CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
175
- ```
176
-
177
- #### Simple example code to load one of these GGUF models
178
-
179
- ```python
180
- from ctransformers import AutoModelForCausalLM
181
-
182
- # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
183
- llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-GGUF", model_file="codellama-7b.q4_K_M.gguf", model_type="llama", gpu_layers=50)
184
-
185
- print(llm("AI is going to"))
186
- ```
187
-
188
- ## How to use with LangChain
189
-
190
- Here's guides on using llama-cpp-python or ctransformers with LangChain:
191
-
192
- * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
193
- * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
194
-
195
- <!-- README_GGUF.md-how-to-run end -->
196
-
197
- <!-- footer start -->
198
- <!-- 200823 -->
199
-
200
 
201
 
 
75
 
76
 
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79