Ibraaheem commited on
Commit
c0ae39c
·
1 Parent(s): afe3376

Update docs/description.md

Browse files
Files changed (1) hide show
  1. docs/description.md +1 -454
docs/description.md CHANGED
@@ -1,7 +1,6 @@
1
  ## Introduction
2
 
3
- PrivateGPT provides an **API** containing all the building blocks required to build
4
- **private, context-aware AI applications**. The API follows and extends OpenAI API standard, and supports
5
  both normal and streaming responses.
6
 
7
  The API is divided in two logical blocks:
@@ -16,458 +15,6 @@ The API is divided in two logical blocks:
16
  - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
17
  documents.
18
 
19
- > A working **Gradio UI client** is provided to test the API, together with a set of
20
- > useful tools such as bulk model download script, ingestion script, documents folder
21
- > watch, etc.
22
-
23
- ## Quick Local Installation steps
24
-
25
- The steps in `Installation and Settings` section are better explained and cover more
26
- setup scenarios. But if you are looking for a quick setup guide, here it is:
27
-
28
- ```
29
- # Clone the repo
30
- git clone https://github.com/imartinez/privateGPT
31
- cd privateGPT
32
-
33
- # Install Python 3.11
34
- pyenv install 3.11
35
- pyenv local 3.11
36
-
37
- # Install dependencies
38
- poetry install --with ui,local
39
-
40
- # Download Embedding and LLM models
41
- poetry run python scripts/setup
42
-
43
- # (Optional) For Mac with Metal GPU, enable it. Check Installation and Settings section
44
- to know how to enable GPU on other platforms
45
- CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
46
-
47
- # Run the local server
48
- PGPT_PROFILES=local make run
49
-
50
- # Note: on Mac with Metal you should see a ggml_metal_add_buffer log, stating GPU is
51
- being used
52
-
53
- # Navigate to the UI and try it out!
54
- http://localhost:8001/
55
- ```
56
-
57
- ## Installation and Settings
58
-
59
- ### Base requirements to run PrivateGPT
60
-
61
- * Git clone PrivateGPT repository, and navigate to it:
62
-
63
- ```
64
- git clone https://github.com/imartinez/privateGPT
65
- cd privateGPT
66
- ```
67
-
68
- * Install Python 3.11. Ideally through a python version manager like `pyenv`.
69
- Python 3.12
70
- should work too. Earlier python versions are not supported.
71
- * osx/linux: [pyenv](https://github.com/pyenv/pyenv)
72
- * windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
73
-
74
- ```
75
- pyenv install 3.11
76
- pyenv local 3.11
77
- ```
78
-
79
- * Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
80
-
81
- * Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
82
-
83
- * Install `make` for scripts:
84
- * osx: (Using homebrew): `brew install make`
85
- * windows: (Using chocolatey) `choco install make`
86
-
87
- ### Install dependencies
88
-
89
- Install the dependencies:
90
-
91
- ```bash
92
- poetry install --with ui
93
- ```
94
-
95
- Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
96
- http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
97
- echo back the input. Later we'll see how to configure a real LLM.
98
-
99
- ### Settings
100
-
101
- > Note: the default settings of PrivateGPT work out-of-the-box for a 100% local setup. Skip this section if you just
102
- > want to test PrivateGPT locally, and come back later to learn about more configuration options.
103
-
104
- PrivateGPT is configured through *profiles* that are defined using yaml files, and selected through env variables.
105
- The full list of properties configurable can be found in `settings.yaml`
106
-
107
- #### env var `PGPT_SETTINGS_FOLDER`
108
-
109
- The location of the settings folder. Defaults to the root of the project.
110
- Should contain the default `settings.yaml` and any other `settings-{profile}.yaml`.
111
-
112
- #### env var `PGPT_PROFILES`
113
-
114
- By default, the profile definition in `settings.yaml` is loaded.
115
- Using this env var you can load additional profiles; format is a comma separated list of profile names.
116
- This will merge `settings-{profile}.yaml` on top of the base settings file.
117
-
118
- For example:
119
- `PGPT_PROFILES=local,cuda` will load `settings-local.yaml`
120
- and `settings-cuda.yaml`, their contents will be merged with
121
- later profiles properties overriding values of earlier ones like `settings.yaml`.
122
-
123
- During testing, the `test` profile will be active along with the default, therefore `settings-test.yaml`
124
- file is required.
125
-
126
- #### Environment variables expansion
127
-
128
- Configuration files can contain environment variables,
129
- they will be expanded at runtime.
130
-
131
- Expansion must follow the pattern `${VARIABLE_NAME:default_value}`.
132
-
133
- For example, the following configuration will use the value of the `PORT`
134
- environment variable or `8001` if it's not set.
135
- Missing variables with no default will produce an error.
136
-
137
- ```yaml
138
- server:
139
- port: ${PORT:8001}
140
- ```
141
-
142
- ### Local LLM requirements
143
-
144
- Install extra dependencies for local execution:
145
-
146
- ```bash
147
- poetry install --with local
148
- ```
149
-
150
- For PrivateGPT to run fully locally GPU acceleration is required
151
- (CPU execution is possible, but very slow), however,
152
- typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
153
- even the smallest LLMs. For that reason
154
- **local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
155
-
156
- These two models are known to work well:
157
-
158
- * https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
159
- * https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
160
-
161
- To ease the installation process, use the `setup` script that will download both
162
- the embedding and the LLM model and place them in the correct location (under `models` folder):
163
-
164
- ```bash
165
- poetry run python scripts/setup
166
- ```
167
-
168
- If you are ok with CPU execution, you can skip the rest of this section.
169
-
170
- As stated before, llama.cpp is required and in
171
- particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
172
- is used.
173
-
174
- > It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
175
- > Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
176
-
177
- #### Customizing low level parameters
178
-
179
- Currently not all the parameters of llama-cpp and llama-cpp-python are available at PrivateGPT's `settings.yaml` file. In case you need to customize parameters such as the number of layers loaded into the GPU, you might change these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`. If you are getting an out of memory error, you might also try a smaller model or stick to the proposed recommended models, instead of custom tuning the parameters.
180
-
181
- #### OSX GPU support
182
-
183
- You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with
184
- metal support. To do that run:
185
-
186
- ```bash
187
- CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
188
- ```
189
-
190
- #### Windows NVIDIA GPU support
191
-
192
- Windows GPU support is done through CUDA.
193
- Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
194
- dependencies.
195
-
196
- Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):
197
-
198
- * Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
199
- * Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
200
- * Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
201
- date and your GPU is detected.
202
- * [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/
203
-
204
- If you have all required dependencies properly configured running the
205
- following powershell command should succeed.
206
-
207
- ```powershell
208
- $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
209
- ```
210
-
211
- If your installation was correct, you should see a message similar to the following next
212
- time you start the server `BLAS = 1`.
213
-
214
- ```
215
- llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
216
- AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
217
- ```
218
-
219
- Note that llama.cpp offloads matrix calculations to the GPU but the performance is
220
- still hit heavily due to latency between CPU and GPU communication. You might need to tweak
221
- batch sizes and other parameters to get the best performance for your particular system.
222
-
223
- #### Linux NVIDIA GPU support and Windows-WSL
224
-
225
- Linux GPU support is done through CUDA.
226
- Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
227
- external
228
- dependencies.
229
-
230
- Some tips:
231
-
232
- * Make sure you have an up-to-date C++ compiler
233
- * Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
234
- * Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
235
- date and your GPU is detected.
236
-
237
- After that running the following command in the repository will install llama.cpp with GPU support:
238
-
239
- `
240
- CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
241
- `
242
-
243
- If your installation was correct, you should see a message similar to the following next
244
- time you start the server `BLAS = 1`.
245
-
246
- ```
247
- llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
248
- AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
249
- ```
250
-
251
- #### Vectorstores
252
- PrivateGPT supports [Chroma](https://www.trychroma.com/), [Qdrant](https://qdrant.tech/) as vectorstore providers. Chroma being the default.
253
-
254
- To enable Qdrant, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant` and install the `qdrant` extra.
255
-
256
- ```bash
257
- poetry install --extras qdrant
258
- ```
259
-
260
- By default Qdrant tries to connect to an instance at `http://localhost:3000`.
261
-
262
- Qdrant settings can be configured by setting values to the `qdrant` property in the `settings.yaml` file.
263
-
264
- The available configuration options are:
265
- | Field | Description |
266
- |--------------|-------------|
267
- | location | If `:memory:` - use in-memory Qdrant instance.<br>If `str` - use it as a `url` parameter.|
268
- | url | Either host or str of 'Optional[scheme], host, Optional[port], Optional[prefix]'.<br> Eg. `http://localhost:6333` |
269
- | port | Port of the REST API interface. Default: `6333` |
270
- | grpc_port | Port of the gRPC interface. Default: `6334` |
271
- | prefer_grpc | If `true` - use gRPC interface whenever possible in custom methods. |
272
- | https | If `true` - use HTTPS(SSL) protocol.|
273
- | api_key | API key for authentication in Qdrant Cloud.|
274
- | prefix | If set, add `prefix` to the REST URL path.<br>Example: `service/v1` will result in `http://localhost:6333/service/v1/{qdrant-endpoint}` for REST API.|
275
- | timeout | Timeout for REST and gRPC API requests.<br>Default: 5.0 seconds for REST and unlimited for gRPC |
276
- | host | Host name of Qdrant service. If url and host are not set, defaults to 'localhost'.|
277
- | path | Persistence path for QdrantLocal. Eg. `local_data/private_gpt/qdrant`|
278
- | force_disable_check_same_thread | Force disable check_same_thread for QdrantLocal sqlite connection.|
279
-
280
- #### Known issues and Troubleshooting
281
-
282
- Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
283
- You might encounter several issues:
284
-
285
- * Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
286
- * GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
287
- the host.
288
- * Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
289
- Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
290
- If you encounter any of these issues, please open an issue and we'll try to help.
291
-
292
- #### Troubleshooting: C++ Compiler
293
-
294
- If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
295
- compiler on your computer.
296
-
297
- **For Windows 10/11**
298
-
299
- To install a C++ compiler on Windows 10/11, follow these steps:
300
-
301
- 1. Install Visual Studio 2022.
302
- 2. Make sure the following components are selected:
303
- * Universal Windows Platform development
304
- * C++ CMake tools for Windows
305
- 3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
306
- 4. Run the installer and select the `gcc` component.
307
-
308
- ** For OSX **
309
-
310
- 1. Check if you have a C++ compiler installed, Xcode might have done it for you. for example running `gcc`.
311
- 2. If not, you can install clang or gcc with homebrew `brew install gcc`
312
-
313
- #### Troubleshooting: Mac Running Intel
314
-
315
- When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
316
- -march=native'_ during pip install.
317
-
318
- If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
319
-
320
- ## Running the Server
321
-
322
- After following the installation steps you should be ready to go. Here are some common run setups:
323
-
324
- ### Running 100% locally
325
-
326
- Make sure you have followed the *Local LLM requirements* section before moving on.
327
-
328
- This command will start PrivateGPT using the `settings.yaml` (default profile) together with the `settings-local.yaml`
329
- configuration files. By default, it will enable both the API and the Gradio UI. Run:
330
-
331
- ```
332
- PGPT_PROFILES=local make run
333
- ```
334
-
335
- or
336
-
337
- ```
338
- PGPT_PROFILES=local poetry run python -m private_gpt
339
- ```
340
-
341
- When the server is started it will print a log *Application startup complete*.
342
- Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
343
- using Swagger UI.
344
-
345
- ### Local server using OpenAI as LLM
346
-
347
- If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
348
- decide to run PrivateGPT using OpenAI as the LLM.
349
-
350
- In order to do so, create a profile `settings-openai.yaml` with the following contents:
351
-
352
- ```yaml
353
- llm:
354
- mode: openai
355
-
356
- openai:
357
- api_key: <your_openai_api_key> # You could skip this configuration and use the OPENAI_API_KEY env var instead
358
- ```
359
-
360
- And run PrivateGPT loading that profile you just created:
361
-
362
- ```PGPT_PROFILES=openai make run```
363
-
364
- or
365
-
366
- ```PGPT_PROFILES=openai poetry run python -m private_gpt```
367
-
368
- > Note this will still use the local Embeddings model, as it is ok to use it on a CPU.
369
- > We'll support using OpenAI embeddings in a future release.
370
-
371
- When the server is started it will print a log *Application startup complete*.
372
- Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
373
- You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy
374
- computations.
375
-
376
- ### Use AWS's Sagemaker
377
-
378
- 🚧 Under construction 🚧
379
-
380
- ## Gradio UI user manual
381
-
382
- Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
383
-
384
- ![Gradio PrivateGPT](https://lh3.googleusercontent.com/drive-viewer/AK7aPaD_Hc-A8A9ooMe-hPgm_eImgsbxAjb__8nFYj8b_WwzvL1Gy90oAnp1DfhPaN6yGiEHCOXs0r77W1bYHtPzlVwbV7fMsA=s1600)
385
-
386
- ### Execution Modes
387
-
388
- It has 3 modes of execution (you can select in the top-left):
389
-
390
- * Query Docs: uses the context from the
391
- ingested documents to answer the questions posted in the chat. It also takes
392
- into account previous chat messages as context.
393
- * Makes use of `/chat/completions` API with `use_context=true` and no
394
- `context_filter`.
395
- * Search in Docs: fast search that returns the 4 most related text
396
- chunks, together with their source document and page.
397
- * Makes use of `/chunks` API with no `context_filter`, `limit=4` and
398
- `prev_next_chunks=0`.
399
- * LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
400
- be taken into account, only the previous messages.
401
- * Makes use of `/chat/completions` API with `use_context=false`.
402
-
403
- ### Document Ingestion
404
-
405
- Ingest documents by using the `Upload a File` button. You can check the progress of
406
- the ingestion in the console logs of the server.
407
-
408
- The list of ingested files is shown below the button.
409
-
410
- If you want to delete the ingested documents, refer to *Reset Local documents
411
- database* section in the documentation.
412
-
413
- ### Chat
414
-
415
- Normal chat interface, self-explanatory ;)
416
-
417
- You can check the actual prompt being passed to the LLM by looking at the logs of
418
- the server. We'll add better observability in future releases.
419
-
420
- ## Deployment options
421
-
422
- 🚧 We are working on Dockerized deployment guidelines 🚧
423
-
424
- ## Observability
425
-
426
- Basic logs are enabled using LlamaIndex
427
- basic logging (for example ingestion progress or LLM prompts and answers).
428
-
429
- 🚧 We are working on improved Observability. 🚧
430
-
431
- ## Ingesting & Managing Documents
432
-
433
- 🚧 Document Update and Delete are still WIP. 🚧
434
-
435
- The ingestion of documents can be done in different ways:
436
-
437
- * Using the `/ingest` API
438
- * Using the Gradio UI
439
- * Using the Bulk Local Ingestion functionality (check next section)
440
-
441
- ### Bulk Local Ingestion
442
-
443
- When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
444
- pdf, text files, etc.)
445
- and optionally watch changes on it with the command:
446
-
447
- ```bash
448
- make ingest /path/to/folder -- --watch
449
- ```
450
-
451
- To log the processed and failed files to an additional file, use:
452
-
453
- ```bash
454
- make ingest /path/to/folder -- --watch --log-file /path/to/log/file.log
455
- ```
456
-
457
- After ingestion is complete, you should be able to chat with your documents
458
- by navigating to http://localhost:8001 and using the option `Query documents`,
459
- or using the completions / chat API.
460
-
461
- ### Reset Local documents database
462
-
463
- When running in a local setup, you can remove all ingested documents by simply
464
- deleting all contents of `local_data` folder (except .gitignore).
465
-
466
- To simplify this process, you can use the command:
467
- ```bash
468
- make wipe
469
- ```
470
-
471
  ## API
472
 
473
  As explained in the introduction, the API contains high level APIs (ingestion and chat/completions) and low level APIs
 
1
  ## Introduction
2
 
3
+ The API follows and extends OpenAI API standard, and supports
 
4
  both normal and streaming responses.
5
 
6
  The API is divided in two logical blocks:
 
15
  - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
16
  documents.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## API
19
 
20
  As explained in the introduction, the API contains high level APIs (ingestion and chat/completions) and low level APIs