File size: 1,928 Bytes
bccc26c
e74ce20
 
bccc26c
 
 
 
 
 
 
 
 
 
 
e74ce20
 
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
 
 
 
 
 
 
 
 
e74ce20
bccc26c
e74ce20
bccc26c
 
 
 
 
 
 
 
e74ce20
bccc26c
 
e74ce20
bccc26c
 
 
 
 
e74ce20
 
bccc26c
 
 
 
 
 
 
e425069
bccc26c
 
 
 
 
 
 
 
 
 
 
 
e425069
bccc26c
e425069
bccc26c
e425069
bccc26c
e425069
bccc26c
e425069
bccc26c
e74ce20
 
bccc26c
e74ce20
bccc26c
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

---
language: bn
tags:
- collaborative
- bengali
- NER
license: apache-2.0
datasets: xtreme 
metrics:
- Loss
- Accuracy
- Precision
- Recall
---

# sahajBERT Named Entity Recognition

## Model description

[sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann). 

Named Entities predicted by the model:

| Label id | Label |
|:--------:|:----:|
|0 |O|
|1 |B-PER|
|2 |I-PER|
|3 |B-ORG|
|4 |I-ORG|
|5 |B-LOC|
|6 |I-LOC|

## Intended uses & limitations

#### How to use

You can use this model directly with a pipeline for token classification:
```python
from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")

# Initialize model
model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")

# Initialize pipeline
pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```

#### Limitations and bias

<!-- Provide examples of latent issues and potential remediations. -->
WIP

## Training data

The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step 19519 and trained on the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann)

## Training procedure

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->

## Eval results

accuracy: 0.9756540697674418

f1: 0.9570102589154861

loss: 0.13705264031887054

precision: 0.9518950437317785

recall: 0.962180746561886



### BibTeX entry and citation info

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->