Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
language: ky
|
| 4 |
+
datasets:
|
| 5 |
+
- wikiann
|
| 6 |
+
examples:
|
| 7 |
+
widget:
|
| 8 |
+
- text: "Бириккен Улуттар Уюму"
|
| 9 |
+
example_title: "Sentence_1"
|
| 10 |
+
- text: "Жусуп Мамай"
|
| 11 |
+
example_title: "Sentence_2"
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
<h1>Kyrgyz Named Entity Recognition</h1>
|
| 15 |
+
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.
|
| 16 |
+
WARNING: this model is not usable (see metrics below). I'll update the model after cleaning up the Wikiann dataset and re-training.
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
## Label ID and its corresponding label name
|
| 20 |
+
|
| 21 |
+
| Label ID | Label Name|
|
| 22 |
+
| -------- | ----- |
|
| 23 |
+
| 0 | O |
|
| 24 |
+
| 1 | B-PER |
|
| 25 |
+
| 2 | I-PER |
|
| 26 |
+
| 3 | B-ORG|
|
| 27 |
+
| 4 | I-ORG |
|
| 28 |
+
| 5 | B-LOC |
|
| 29 |
+
| 6 | I-LOC |
|
| 30 |
+
|
| 31 |
+
<h1>Results</h1>
|
| 32 |
+
|
| 33 |
+
| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
|
| 34 |
+
| ---- | -------- | ----- | ---- | ---- |
|
| 35 |
+
| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 |
|
| 36 |
+
| Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 |
|
| 37 |
+
| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 |
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
Example
|
| 41 |
+
```py
|
| 42 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 43 |
+
from transformers import pipeline
|
| 44 |
+
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
|
| 45 |
+
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
|
| 46 |
+
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
|
| 47 |
+
example = "Жусуп Мамай"
|
| 48 |
+
ner_results = nlp(example)
|
| 49 |
+
ner_results
|
| 50 |
+
```
|