Updated README
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
|
|
13 |
|
14 |
# How to use:
|
15 |
|
16 |
-
The model can be used directly with the pipeline for masked language modeling as follows:
|
17 |
|
18 |
```
|
19 |
from transformers import pipeline
|
@@ -29,5 +29,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
|
|
29 |
{'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
|
30 |
{'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
|
31 |
{'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
```
|
33 |
__Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
|
|
|
13 |
|
14 |
# How to use:
|
15 |
|
16 |
+
1) The model can be used directly with the pipeline for masked language modeling as follows:
|
17 |
|
18 |
```
|
19 |
from transformers import pipeline
|
|
|
29 |
{'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
|
30 |
{'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
|
31 |
{'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
|
32 |
+
```
|
33 |
+
|
34 |
+
2) Direct use from the transformer library to get features using AutoModel
|
35 |
+
|
36 |
+
```
|
37 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
38 |
+
|
39 |
+
tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-large")
|
40 |
+
|
41 |
+
model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-large")
|
42 |
+
|
43 |
+
input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
|
44 |
+
encoded_input = tokenizer(input_text, return_tensors='pt')
|
45 |
+
output = model(**encoded_input)
|
46 |
+
|
47 |
```
|
48 |
__Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
|