File size: 1,467 Bytes
7280b63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7247118
17aaa37
 
7247118
17aaa37
7280b63
7247118
17aaa37
 
7247118
17aaa37
7280b63
 
7247118
 
 
 
 
 
 
 
7280b63
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
base_model: AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct
language:
- sv
- da
- 'no'
- en
pipeline_tag: text-generation
tags:
- translation
---
# Model Card for gpt-sw3-6.7b-v2-translator-gguf
The `gpt-sw3-6.7b-v2-translator` is a finetuned version of `gpt-sw3-6.7b-v2-instruct` on a carefully selected translation pair dataset that was gathered by AI Sweden.


## Intended usage:
Translate text data from English to Swedish, or Swedish to English.


## How to use:
Translate from English to Swedish:
```bash
FROM ./gpt-sw3-6-7b-v2-translator-SIZE.gguf
TEMPLATE "<|endoftext|><s>User: Översätt till Svenska från Engelska\n{{ .Prompt }}<s>Bot:"
PARAMETER stop <s>
```
Translate from Swedish to English:
```bash
FROM ./gpt-sw3-6-7b-v2-translator-SIZE.gguf
TEMPLATE "<|endoftext|><s>User: Översätt till Engelska från Svenska\n{{ .Prompt }}<s>Bot:"
PARAMETER stop <s>
```

## Versions:
```
gpt-sw3-6-7b-v2-translator-q4.gguf
gpt-sw3-6-7b-v2-translator-q8.gguf
gpt-sw3-6-7b-v2-translator-f16.gguf
gpt-sw3-6-7b-v2-translator-f32.gguf
```

## Training & Data:
The training was done on 1 NVIDIA DGX using DeepSpeed ZeRO 3 for three epochs on roughly 4GB of carefully selected translation data. It is a full finetune of all of the model parameters. 

| Epoch | Training Loss | Evaluation Loss |
|-------|---------------|-----------------|
| 1     | 1.309         | 1.281           |
| 2     | 1.161         | 1.242           |
| 3     | 1.053         | 1.219           |