DavidLanz commited on
Commit
d3c4b72
·
verified ·
1 Parent(s): 0b08d1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -3
README.md CHANGED
@@ -1,3 +1,43 @@
1
- ---
2
- license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ license: apache-2.0
5
+ datasets:
6
+ - DavidLanz/TaiwanChat
7
+ pipeline_tag: text-generation
8
+ widget:
9
+ - example_title: 範例一
10
+ messages:
11
+ - role: user
12
+ content: 你好
13
+ ---
14
+ ## Model Card for Model ID
15
+
16
+ This model is the instruction finetuning version of [benchang1110/Taiwan-tinyllama-v1.0-base](https://huggingface.co/benchang1110/Taiwan-tinyllama-v1.0-base).
17
+
18
+ ## Usage
19
+ ```python
20
+ import torch, transformers
21
+
22
+ def generate_response():
23
+ model = transformers.AutoModelForCausalLM.from_pretrained("DavidLanz/Taiwan-tinyllama-v1.0-chat", torch_dtype=torch.bfloat16, device_map=device, attn_implementation="flash_attention_2")
24
+ tokenizer = transformers.AutoTokenizer.from_pretrained("DavidLanz/Taiwan-tinyllama-v1.0-chat")
25
+ streamer = transformers.TextStreamer(tokenizer,skip_prompt=True)
26
+ while(1):
27
+ prompt = input('USER:')
28
+ if prompt == "exit":
29
+ break
30
+ print("Assistant: ")
31
+ message = [
32
+ {'content': prompt, 'role': 'user'},
33
+ ]
34
+ untokenized_chat = tokenizer.apply_chat_template(message,tokenize=False,add_generation_prompt=False)
35
+ inputs = tokenizer.encode_plus(untokenized_chat, add_special_tokens=True, return_tensors="pt",return_attention_mask=True).to(device)
36
+ outputs = model.generate(inputs["input_ids"],attention_mask=inputs['attention_mask'],streamer=streamer,use_cache=True,max_new_tokens=512,do_sample=True,temperature=0.1,repetition_penalty=1.2)
37
+
38
+
39
+ if __name__ == '__main__':
40
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
41
+ generate_response()
42
+
43
+ ```