Update README.md
Browse files
README.md
CHANGED
@@ -8,28 +8,6 @@ pipeline_tag: token-classification
|
|
8 |
# Bangla-Person-Name-Extractor
|
9 |
This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
|
10 |
|
11 |
-
|
12 |
-
# Datasets
|
13 |
-
We used two datasets to train and evaluate our pipeline.
|
14 |
-
1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
|
15 |
-
2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
|
16 |
-
|
17 |
-
The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
|
18 |
-
|
19 |
-
# Training and Evaluation
|
20 |
-
We treated this problem as a token classification task.So it seemed perfect to finetune BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
21 |
-
We mainly finetuned two checkpoints of BanglaBERT.
|
22 |
-
1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
|
23 |
-
2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
|
24 |
-
|
25 |
-
BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
|
26 |
-
Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
|
27 |
-
|
28 |
-
**Quantitative results**
|
29 |
-
Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
|
30 |
-
<br></br>
|
31 |
-

|
32 |
-
|
33 |
# How to use it?
|
34 |
[This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
|
35 |
<br></br>
|
@@ -71,7 +49,7 @@ print(f"Input Sentence : {sentence}")
|
|
71 |
print(f"Person Name Entities : {pred}")
|
72 |
```
|
73 |
|
74 |
-
**Output
|
75 |
```
|
76 |
Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
|
77 |
Person Name Entities : ['আব্দুর' 'রহিম']
|
@@ -83,4 +61,24 @@ Person Name Entities : ['দেলোয়ার' 'হোসেন' 'মজু
|
|
83 |
|
84 |
Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
|
85 |
Person Name Entities : []
|
86 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
# Bangla-Person-Name-Extractor
|
9 |
This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
# How to use it?
|
12 |
[This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
|
13 |
<br></br>
|
|
|
49 |
print(f"Person Name Entities : {pred}")
|
50 |
```
|
51 |
|
52 |
+
**Output:**
|
53 |
```
|
54 |
Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
|
55 |
Person Name Entities : ['আব্দুর' 'রহিম']
|
|
|
61 |
|
62 |
Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
|
63 |
Person Name Entities : []
|
64 |
+
```
|
65 |
+
# Datasets
|
66 |
+
We used two datasets to train and evaluate our pipeline.
|
67 |
+
1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
|
68 |
+
2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
|
69 |
+
|
70 |
+
The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
|
71 |
+
|
72 |
+
# Training and Evaluation
|
73 |
+
We treated this problem as a token classification task.So it seemed perfect to finetune the BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
74 |
+
We mainly finetuned two checkpoints of BanglaBERT.
|
75 |
+
1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
|
76 |
+
2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
|
77 |
+
|
78 |
+
BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
|
79 |
+
Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
|
80 |
+
|
81 |
+
**Quantitative results**
|
82 |
+
Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
|
83 |
+
<br></br>
|
84 |
+

|