infosys
/

NT-Java-1.1B-GGUF

@@ -75,127 +75,5 @@ Refer to the Provided Files table below to see what files use which methods, and
-<!-- README_GGUF.md-provided-files end -->
-<!-- README_GGUF.md-how-to-download start -->
-## How to download GGUF files
-**Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
-The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
-- LM Studio
-- LoLLMS Web UI
-- Faraday.dev
-### In `text-generation-webui`
-Under Download Model, you can enter the model repo: TheBloke/CodeLlama-7B-GGUF and below it, a specific filename to download, such as: codellama-7b.q4_K_M.gguf.
-Then click Download.
-### On the command line, including multiple files at once
-I recommend using the `huggingface-hub` Python library:
-```shell
-pip3 install huggingface-hub>=0.17.1
-```
-Then you can download any individual model file to the current directory, at high speed, with a command like this:
-```shell
-huggingface-cli download TheBloke/CodeLlama-7B-GGUF codellama-7b.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
-```
-<details>
-  <summary>More advanced huggingface-cli download usage</summary>
-You can also download multiple files at once with a pattern:
-```shell
-huggingface-cli download TheBloke/CodeLlama-7B-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
-```
-For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
-To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
-```shell
-pip3 install hf_transfer
-```
-And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
-```shell
-HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/CodeLlama-7B-GGUF codellama-7b.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
-```
-Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
-</details>
-<!-- README_GGUF.md-how-to-download end -->
-<!-- README_GGUF.md-how-to-run start -->
-## Example `llama.cpp` command
-Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
-```shell
-./main -ngl 32 -m codellama-7b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
-```
-Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
-Change `-c 4096` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
-If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
-For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
-## How to run in `text-generation-webui`
-Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
-## How to run from Python code
-You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
-### How to load this model from Python using ctransformers
-#### First install the package
-```bash
-# Base ctransformers with no GPU acceleration
-pip install ctransformers>=0.2.24
-# Or with CUDA GPU acceleration
-pip install ctransformers[cuda]>=0.2.24
-# Or with ROCm GPU acceleration
-CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
-# Or with Metal GPU acceleration for macOS systems
-CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
-```
-#### Simple example code to load one of these GGUF models
-```python
-from ctransformers import AutoModelForCausalLM
-# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
-llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-GGUF", model_file="codellama-7b.q4_K_M.gguf", model_type="llama", gpu_layers=50)
-print(llm("AI is going to"))
-```
-## How to use with LangChain
-Here's guides on using llama-cpp-python or ctransformers with LangChain:
-* [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
-* [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
-<!-- README_GGUF.md-how-to-run end -->
-<!-- footer start -->
-<!-- 200823 -->


75
76
77


























































































































78
79