File size: 11,732 Bytes
a3c403b
 
 
 
 
 
 
 
 
 
 
da34e53
a3c403b
3987c0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c1a107
 
 
3987c0f
 
 
 
 
941eade
20624af
18de094
 
3987c0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-1
---

# 🇩🇪 GERTuraX-1

This repository hosts the GERTuraX-1 model:

* GERTuraX-1 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 147GB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

# Pretraining

The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.

As pretraining corpus, 147GB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

GERTuraX-1 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 256 and a sequence length of 512 on a v3-32 TPU Pod.

The pretraining took 2.6 days and the TensorBoard can be found [here](../../tensorboard).

# Evaluation

GERTuraX-1 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.

We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score, conducted with the awesome Flair library.

The fine-tuning code repository can be found [here](https://github.com/stefan-it/gerturax-fine-tuner).

## GermEval 2014

### GermEval 2014 - Original version

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 87.53 ± 0.22              | 86.81 ± 0.16       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 88.32 ± 0.21              | 87.18 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 88.58 ± 0.32              | 87.58 ± 0.15       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 88.90 ± 0.06              | 87.84 ± 0.18       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 88.79 ± 0.16              | 88.03 ± 0.16       |

### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia)

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 90.48 ± 0.34              | 89.05 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 91.27 ± 0.11              | 89.73 ± 0.27       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 91.70 ± 0.28              | 89.98 ± 0.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 91.75 ± 0.17              | 90.24 ± 0.27       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 91.74 ± 0.23              | 90.28 ± 0.21       |

## GermEval 2018

### GermEval 2018 - Fine Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 63.66 ± 4.08              | 51.86 ± 1.31       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 62.87 ± 1.95              | 50.61 ± 0.36       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 64.37 ± 1.31              | 51.02 ± 0.90       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 66.39 ± 0.85              | 49.94 ± 2.06       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 65.81 ± 3.29              | 52.45 ± 0.57       |

### GermEval 2018 - Coarse Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 83.15 ± 1.83              | 76.39 ± 0.64       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 83.72 ± 0.68              | 77.11 ± 0.59       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 84.51 ± 0.88              | 78.07 ± 0.91       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 84.33 ± 1.48              | 78.44 ± 0.74       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 83.54 ± 1.27              | 78.36 ± 0.79       |

## CoNLL-2003 - German, Revised

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 92.15 ± 0.10              | 88.73 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 92.32 ± 0.14              | 90.09 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 92.75 ± 0.20              | 90.15 ± 0.14       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 92.77 ± 0.28              | 90.83 ± 0.16       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 92.87 ± 0.21              | 90.94 ± 0.24       |

## ScandEval

We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:

* SB10k
* ScaLA-De
* GermanQuAD

The package can be installed via:

```bash
$ pip3 install "scandeval[all]==12.10.5"
```

### Results

#### SB10k

Evaluations on the SB10k dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 59.58 ± 1.80              | 72.98 ± 1.20       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 61.56 ± 2.58              | 74.18 ± 1.77       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 65.24 ± 1.77              | 76.55 ± 1.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 64.33 ± 2.17              | 75.99 ± 1.40       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.52 ± 2.14              | 72.76 ± 1.50       |

#### ScaLA-De

Evaluations on the ScaLA-De dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 52.23 ± 4.34              | 73.90 ± 2.68       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 74.55 ± 1.28              | 86.88 ± 0.75       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 75.83 ± 2.85              | 87.59 ± 1.57       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 78.24 ± 1.25              | 88.83 ± 0.63       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.70 ± 11.64             | 78.44 ± 6.12       |

#### GermanQuAD

```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```

| Model Name                                                                          | Em                        | F1-Score           |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 12.62 ± 2.20              | 29.62 ± 3.86       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 27.24 ± 1.05              | 52.01 ± 1.10       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 29.54 ± 1.05              | 55.12 ± 0.92       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 28.49 ± 1.21              | 54.83 ± 1.26       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 28.81 ± 1.77              | 53.27 ± 1.92       |

# ❤️ Acknowledgements

GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.

Many thanks for providing TPUs!

Made from Bavarian Oberland with ❤️ and 🥨.