File size: 1,563 Bytes
ab57c19
 
 
6a777e6
ace024e
dec9cdf
273fdf7
dec9cdf
86dbc2f
273fdf7
dec9cdf
9881297
 
ae753cc
9881297
 
 
3854865
 
 
 
 
 
 
9881297
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a994377
3bd867a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
library_name: transformers
tags: []
widget:
- text: 'Thank you for approaching me about the collaboration. You can talk to my manager, Kritik at 9874512563 or [email protected]'
  example_title: Email 1
- text: 'Call me on 9874569874'
  example_title: Email 2
- text: 'You can email me at [email protected] or call directly on 9999988888. The point of contact would be my manager Manish Neupane'
  example_title: Email 3
---
Overview:

The Model is fine-tuned for 3 class + "0" class.<br>
The Dataset is custom annotated and contains 400 texts and the model was trained on the split of 0.76, 0.12, and 0.12.

The validation classification report is as follows:

|Class| Precision   |      Recall      |  f1 |
|-----|----------|:-------------:|------:|
|  0 | 1.00 | 1.00 | 1.00 |
|  1 | 0.98 | 1.00 | 0.91 |
|  2 | 0.95 | 0.89 | 0.92 |
|  3 | 0.8  | 0.88 | 0.84 |
|  macro-avg | 0.93 | 0.94 | 0.94 |

The test classification report is as follows:

|Class| Precision   |      Recall      |  f1 |
|-----|----------|:-------------:|------:|
|  0 | 1.00 | 1.00 | 1.00 |
|  1 | 0.98 | 1.00 | 0.99 |
|  2 | 0.66 | 0.97 | 0.79 |
|  3 | 0.84 | 0.78 | 0.81 |
|  macro-avg | 0.87 | 0.94 | 0.90 |

Possible future direction:

1. Clean data to a good enough format as much as possible.
2. Increase the data as much as possible. (Make sure to have data that is seen in real use cases.)
3. Ponder: Is it possible to use sth like Grammarly to clean the sentences before tokenization such that proper nouns are Capital and the grammer is correct such that a pattern is formed?