Kohsaku
/

llm-jp-3-13b-finetune-5

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kohsaku commited on Dec 7, 2024

Commit

0606b64

·

verified ·

1 Parent(s): 126892e

update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -20,3 +20,37 @@ language:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+from unsloth import FastLanguageModel
+import torch
+import json
+```
+model_name = "Kohsaku/llm-jp-3-13b-finetune-5"
+max_seq_length = 2048
+dtype = None
+load_in_4bit = True
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = model_name,
+    max_seq_length = max_seq_length,
+    dtype = dtype,
+    load_in_4bit = load_in_4bit,
+    token = HF_TOKEN,
+)
+FastLanguageModel.for_inference(model)
+text = "自然言語処理とは何か"
+tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    output = model.generate(
+        tokenized_input,
+        max_new_tokens = 512,
+        use_cache = True,
+        do_sample=False,
+        repetition_penalty=1.2
+    )[0]
+print(tokenizer.decode(output))
+```