Dani
commited on
Commit
·
6663a43
1
Parent(s):
efb0a99
fixed several vocab issues
Browse files- README.md +4 -13
- config.json +1 -1
- pytorch_model.bin +2 -2
- vocab.txt +13 -0
README.md
CHANGED
@@ -4,24 +4,15 @@ license: apache-2.0
|
|
4 |
datasets:
|
5 |
- wikipedia
|
6 |
widget:
|
7 |
-
- text: "
|
8 |
---
|
9 |
|
10 |
# DistilBERT base multilingual model Spanish subset (cased)
|
11 |
|
12 |
This model is the Spanish extract of `distilbert-base-multilingual-cased` (https://huggingface.co/distilbert-base-multilingual-cased), a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). This model is cased: it does make a difference between english and English.
|
13 |
|
14 |
-
It uses the extraction method proposed by Geotrend
|
15 |
-
Specifically, we've ran the following script:
|
16 |
|
17 |
-
|
18 |
-
python reduce_model.py \
|
19 |
-
--source_model distilbert-base-multilingual-cased \
|
20 |
-
--vocab_file notebooks/selected_tokens/selected_es_tokens.txt \
|
21 |
-
--output_model distilbert-base-es-multilingual-cased \
|
22 |
-
--convert_to_tf False
|
23 |
-
```
|
24 |
|
25 |
-
The
|
26 |
-
|
27 |
-
The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.
|
|
|
4 |
datasets:
|
5 |
- wikipedia
|
6 |
widget:
|
7 |
+
- text: "Mi nombre es Juan y vivo en [MASK]."
|
8 |
---
|
9 |
|
10 |
# DistilBERT base multilingual model Spanish subset (cased)
|
11 |
|
12 |
This model is the Spanish extract of `distilbert-base-multilingual-cased` (https://huggingface.co/distilbert-base-multilingual-cased), a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). This model is cased: it does make a difference between english and English.
|
13 |
|
14 |
+
It uses the extraction method proposed by Geotrend described in https://github.com/Geotrend-research/smaller-transformers.
|
|
|
15 |
|
16 |
+
The resulting model has the same architecture as DistilmBERT: 6 layers, 768 dimension and 12 heads, with a total of **63M parameters** (compared to 134M parameters for DistilmBERT).
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.
|
|
|
|
config.json
CHANGED
@@ -18,5 +18,5 @@
|
|
18 |
"seq_classif_dropout": 0.2,
|
19 |
"sinusoidal_pos_embds": false,
|
20 |
"tie_weights_": true,
|
21 |
-
"vocab_size":
|
22 |
}
|
|
|
18 |
"seq_classif_dropout": 0.2,
|
19 |
"sinusoidal_pos_embds": false,
|
20 |
"tie_weights_": true,
|
21 |
+
"vocab_size": 26360
|
22 |
}
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:02e8562d1e4f7f2fe58e9970fa28b3544b066591bc475777c823ab10adcd9af2
|
3 |
+
size 255182217
|
vocab.txt
CHANGED
@@ -1,4 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
[UNK]
|
|
|
|
|
|
|
2 |
!
|
3 |
"
|
4 |
#
|
|
|
1 |
+
[PAD]
|
2 |
+
[unused1]
|
3 |
+
[unused2]
|
4 |
+
[unused3]
|
5 |
+
[unused4]
|
6 |
+
[unused5]
|
7 |
+
[unused6]
|
8 |
+
[unused7]
|
9 |
+
[unused8]
|
10 |
+
[unused9]
|
11 |
[UNK]
|
12 |
+
[CLS]
|
13 |
+
[SEP]
|
14 |
+
[MASK]
|
15 |
!
|
16 |
"
|
17 |
#
|