jean-paul
/

KinyaBERT-large

Model card Files Files and versions Community

jean-paul commited on Aug 29, 2021

Commit

99a2ea5

·

1 Parent(s): d6cec73

Updated README

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
 # How to use:
-The model can be used directly with the pipeline for masked language modeling as follows:
 ```
 from transformers import pipeline
@@ -29,5 +29,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
 {'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
 {'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
 {'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
 ```
 __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.

 # How to use:
+1) The model can be used directly with the pipeline for masked language modeling as follows:
 ```
 from transformers import pipeline
 {'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
 {'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
 {'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
+```
+2) Direct use from the transformer library to get features using AutoModel
+```
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-large")
+model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-large")
+input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
+encoded_input = tokenizer(input_text, return_tensors='pt')
+output = model(**encoded_input)
 ```
 __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.