pascalrai
/

large-BERT-NER-email

Token Classification

Model card Files Files and versions Community

large-BERT-NER-email / README.md

pascalrai's picture

Update README.md

3bd867a verified about 1 year ago

|

history blame contribute delete

1.56 kB

	---
	library_name: transformers
	tags: []
	widget:
	- text: 'Thank you for approaching me about the collaboration. You can talk to my manager, Kritik at 9874512563 or [email protected]'
	example_title: Email 1
	- text: 'Call me on 9874569874'
	example_title: Email 2
	- text: 'You can email me at [email protected] or call directly on 9999988888. The point of contact would be my manager Manish Neupane'
	example_title: Email 3
	---
	Overview:

	The Model is fine-tuned for 3 class + "0" class.<br>
	The Dataset is custom annotated and contains 400 texts and the model was trained on the split of 0.76, 0.12, and 0.12.

	The validation classification report is as follows:

	\|Class\| Precision \| Recall \| f1 \|
	\|-----\|----------\|:-------------:\|------:\|
	\| 0 \| 1.00 \| 1.00 \| 1.00 \|
	\| 1 \| 0.98 \| 1.00 \| 0.91 \|
	\| 2 \| 0.95 \| 0.89 \| 0.92 \|
	\| 3 \| 0.8 \| 0.88 \| 0.84 \|
	\| macro-avg \| 0.93 \| 0.94 \| 0.94 \|

	The test classification report is as follows:

	\|Class\| Precision \| Recall \| f1 \|
	\|-----\|----------\|:-------------:\|------:\|
	\| 0 \| 1.00 \| 1.00 \| 1.00 \|
	\| 1 \| 0.98 \| 1.00 \| 0.99 \|
	\| 2 \| 0.66 \| 0.97 \| 0.79 \|
	\| 3 \| 0.84 \| 0.78 \| 0.81 \|
	\| macro-avg \| 0.87 \| 0.94 \| 0.90 \|

	Possible future direction:

	1. Clean data to a good enough format as much as possible.
	2. Increase the data as much as possible. (Make sure to have data that is seen in real use cases.)
	3. Ponder: Is it possible to use sth like Grammarly to clean the sentences before tokenization such that proper nouns are Capital and the grammer is correct such that a pattern is formed?