File size: 1,781 Bytes
3fd5589 a849977 3fd5589 90f117c 3fd5589 a849977 3fd5589 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
license: llama3
---
# Meta-Llama-3-8B-Instruct-ct2-int8
This is a [ctranslate2](https://github.com/OpenNMT/CTranslate2) v4.5.0 int8 conversion of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main) created with:
```
ct2-transformers-converter --model meta-llama/Meta-Llama-3-8B-Instruct --output_dir Meta-Llama-3-8B-Instruct-ct2-int8 --quantization int8
```
## Downloading
ct2 doesn't have hf-hub integration, so you'll need to manually download the model files:
```
huggingface-cli download mike-ravkine/Meta-Llama-3-8B-Instruct-ct2-int8 --local-dir Meta-Llama-3-8B-Instruct-ct2-int8/
```
## Using
Install dependencies:
```
pip install transformers[torch] ctranslate2
```
Sample inference code:
```python
import sys
import ctranslate2
from transformers import AutoTokenizer
model_dir = sys.argv[1] # download dir
tokenizer_dir = meta-llama/Meta-Llama-3-8B-Instruct
print("Loading the model...")
generator = ctranslate2.Generator(model_dir, device="cuda")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
dialog = [{"role": "user", "content": "What is the meaning of life, the universe and everything?"}]
max_generation_length = 512
prompt_string = tokenizer.apply_chat_template(dialog, add_generation_prompt=True, tokenize=False)
# It seems silly to tokenize=False and then call tokenize, but tokenize=True returns just ids; we need actual tokens
prompt_tokens = tokenizer.tokenize(prompt_string)
step_results = generator.generate_tokens(
prompt_tokens,
max_length=max_generation_length,
sampling_temperature=0.6,
sampling_topk=20,
sampling_topp=1,
)
for step_result in step_results:
word = tokenizer.decode([step_result.token_id])
print(word, end="", flush=True)
```
|