Commit
·
57299e8
1
Parent(s):
d2262e3
updated readme
Browse files
README.md
CHANGED
@@ -51,13 +51,12 @@ This research project builds a machine learning model that can **automatically a
|
|
51 |
## 📊 Dataset Statistics
|
52 |
|
53 |
The dataset used for training consists of real-world accessibility bug reports, each labeled with one of four priority levels. The distribution of labels is imbalanced, and label-aware preprocessing steps were taken to improve model performance.
|
54 |
-
|
55 |
| Label | Priority Level | Count |
|
56 |
|-------|----------------|-------|
|
57 |
-
| 1 |
|
58 |
-
| 2 |
|
59 |
-
| 0 |
|
60 |
-
| 3 |
|
61 |
|
62 |
**Total Samples**: 5,060
|
63 |
|
@@ -75,10 +74,10 @@ The dataset consists of short bug report texts labeled with one of four priority
|
|
75 |
|
76 |
| Label | Meaning |
|
77 |
|-------|-------------|
|
78 |
-
| 0 |
|
79 |
-
| 1 |
|
80 |
-
| 2 |
|
81 |
-
| 3 |
|
82 |
|
83 |
### ✏️ Sample Entries:
|
84 |
|
@@ -124,8 +123,8 @@ This model fine-tunes `roberta-base` using a 4-class custom dataset to classify
|
|
124 |
from transformers import RobertaTokenizer, RobertaForSequenceClassification
|
125 |
import torch
|
126 |
|
127 |
-
model = RobertaForSequenceClassification.from_pretrained("
|
128 |
-
tokenizer = RobertaTokenizer.from_pretrained("
|
129 |
|
130 |
inputs = tokenizer("VoiceOver skips over text with <strong> tags", return_tensors="pt")
|
131 |
outputs = model(**inputs)
|
|
|
51 |
## 📊 Dataset Statistics
|
52 |
|
53 |
The dataset used for training consists of real-world accessibility bug reports, each labeled with one of four priority levels. The distribution of labels is imbalanced, and label-aware preprocessing steps were taken to improve model performance.
|
|
|
54 |
| Label | Priority Level | Count |
|
55 |
|-------|----------------|-------|
|
56 |
+
| 1 | Critical | 2035 |
|
57 |
+
| 2 | Major | 1465 |
|
58 |
+
| 0 | Blocker | 804 |
|
59 |
+
| 3 | Minor | 756 |
|
60 |
|
61 |
**Total Samples**: 5,060
|
62 |
|
|
|
74 |
|
75 |
| Label | Meaning |
|
76 |
|-------|-------------|
|
77 |
+
| 0 | Blocker |
|
78 |
+
| 1 | Critical |
|
79 |
+
| 2 | Major |
|
80 |
+
| 3 | Minor |
|
81 |
|
82 |
### ✏️ Sample Entries:
|
83 |
|
|
|
123 |
from transformers import RobertaTokenizer, RobertaForSequenceClassification
|
124 |
import torch
|
125 |
|
126 |
+
model = RobertaForSequenceClassification.from_pretrained("shivamjadhav/roberta-priority-multiclass")
|
127 |
+
tokenizer = RobertaTokenizer.from_pretrained("shivamjadhav/roberta-priority-multiclass")
|
128 |
|
129 |
inputs = tokenizer("VoiceOver skips over text with <strong> tags", return_tensors="pt")
|
130 |
outputs = model(**inputs)
|