File size: 1,604 Bytes
5a29263
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# llama.cpp/example/run

The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.

```bash

llama-run granite3-moe

```

```bash

Description:

  Runs a llm



Usage:

  llama-run [options] model [prompt]



Options:

  -c, --context-size <value>

      Context size (default: 2048)

  -n, -ngl, --ngl <value>

      Number of GPU layers (default: 0)

  --temp <value>

      Temperature (default: 0.8)

  -v, --verbose, --log-verbose

      Set verbosity level to infinity (i.e. log all messages, useful for debugging)

  -h, --help

      Show help message



Commands:

  model

      Model is a string with an optional prefix of

      huggingface:// (hf://), ollama://, https:// or file://.

      If no protocol is specified and a file exists in the specified

      path, file:// is assumed, otherwise if a file does not exist in

      the specified path, ollama:// is assumed. Models that are being

      pulled are downloaded with .partial extension while being

      downloaded and then renamed as the file without the .partial

      extension when complete.



Examples:

  llama-run llama3

  llama-run ollama://granite-code

  llama-run ollama://smollm:135m

  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf

  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf

  llama-run https://example.com/some-file1.gguf

  llama-run some-file2.gguf

  llama-run file://some-file3.gguf

  llama-run --ngl 999 some-file4.gguf

  llama-run --ngl 999 some-file5.gguf Hello World

```