timpal0l commited on
Commit
7280b63
·
verified ·
1 Parent(s): 162edb5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct
3
+ language:
4
+ - sv
5
+ - da
6
+ - 'no'
7
+ - en
8
+ pipeline_tag: text-generation
9
+ inference:
10
+ parameters:
11
+ temperature: 0.7
12
+ tags:
13
+ - translation
14
+ ---
15
+ # Model Card for gpt-sw3-6.7b-v2-translator-gguf
16
+ The `gpt-sw3-6.7b-v2-translator` is a finetuned version of `gpt-sw3-6.7b-v2-instruct` on a carefully selected translation pair dataset that was gathered by AI Sweden.
17
+
18
+
19
+ ## Intended usage:
20
+ Translate text data from English to Swedish, or Swedish to English.
21
+
22
+
23
+ ## How to use:
24
+ ```python
25
+ import torch
26
+ from transformers import pipeline, StoppingCriteriaList, StoppingCriteria
27
+
28
+ device = "cuda" if torch.cuda.is_available() else "cpu"
29
+
30
+
31
+ # (Optional) - define a stopping criteria
32
+ # We ideally want the model to stop generate once the response from the Bot is generated
33
+ class StopOnTokenCriteria(StoppingCriteria):
34
+ def __init__(self, stop_token_id):
35
+ self.stop_token_id = stop_token_id
36
+
37
+ def __call__(self, input_ids, scores, **kwargs):
38
+ return input_ids[0, -1] == self.stop_token_id
39
+
40
+
41
+ pipe = pipeline(
42
+ task="text-generation",
43
+ model="AI-Sweden-Models/gpt-sw3-6.7b-v2-translator",
44
+ device=device
45
+ )
46
+
47
+ stop_on_token_criteria = StopOnTokenCriteria(stop_token_id=pipe.tokenizer.bos_token_id)
48
+ text = "I like to eat ice cream in the summer."
49
+
50
+ # This will translate English to Swedish
51
+ # To translate from Swedish to English the prompt would be:
52
+ # prompt = f"<|endoftext|><s>User: Översätt till Engelska från Svenska\n{text}<s>Bot:"
53
+
54
+ prompt = f"<|endoftext|><s>User: Översätt till Svenska från Engelska\n{text}<s>Bot:"
55
+
56
+ input_tokens = pipe.tokenizer(prompt, return_tensors="pt").input_ids.to(device)
57
+ max_model_length = 2048
58
+ dynamic_max_length = max_model_length - input_tokens.shape[1]
59
+
60
+ response = pipe(
61
+ prompt,
62
+ max_length=dynamic_max_length,
63
+ truncation=True,
64
+ stopping_criteria=StoppingCriteriaList([stop_on_token_criteria])
65
+ )
66
+
67
+ print(response[0]["generated_text"].split("<s>Bot: ")[-1])
68
+ ```
69
+ ```python
70
+ >>> "Jag tycker om att äta glass på sommaren."
71
+ ```
72
+
73
+ ## Training & Data:
74
+ The training was done on 1 NVIDIA DGX using DeepSpeed ZeRO 3 for three epochs on roughly 4GB of carefully selected translation data. It is a full finetune of all of the model parameters.
75
+
76
+ | Epoch | Training Loss | Evaluation Loss |
77
+ |-------|---------------|-----------------|
78
+ | 1 | 1.309 | 1.281 |
79
+ | 2 | 1.161 | 1.242 |
80
+ | 3 | 1.053 | 1.219 |