File size: 4,999 Bytes
389f9b6
 
0fe35b7
 
 
389f9b6
 
0fe35b7
7b41c88
0fe35b7
 
a896d2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57299e8
 
 
 
a896d2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57299e8
 
 
 
a896d2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d2262e3
a896d2d
 
 
 
 
 
 
 
57299e8
 
a896d2d
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
title: Bug Priority Multiclass
emoji: πŸ’»
colorFrom: red
colorTo: gray
sdk: docker
pinned: false
short_description: This is a Multiclass Bug Priority Model
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

tags:
- text-classification
- accessibility
- bug-triage
- transformers
- roberta
- pytorch-lightning
license: apache-2.0
datasets:
- custom
language:
- en

# RoBERTa Base Model for Accessibility Bug Priority Classification

This model fine-tunes `roberta-base` using a labeled dataset of accessibility-related bug descriptions to automatically classify their **priority level**. It helps automate the triage of bugs affecting users of screen readers and other assistive technologies.


## 🧠 Problem Statement

Modern applications often suffer from accessibility issues that impact users with disabilities, such as content not being read properly by screen readers like **VoiceOver**, **NVDA**, or **JAWS**. These bugs are often reported via issue trackers or user forums in the form of short text summaries.

Due to the unstructured and domain-specific nature of these reports, manual triage is:
- Time-consuming
- Inconsistent
- Often delayed in resolution

There is a critical need to **prioritize accessibility bugs quickly and accurately** to ensure inclusive user experiences.


## 🎯 Research Objective

This research project builds a machine learning model that can **automatically assign a priority level** to an accessibility bug report. The goal is to:

- Streamline accessibility QA workflows
- Accelerate high-impact fixes
- Empower developers and testers with ML-assisted tooling

## πŸ“Š Dataset Statistics

The dataset used for training consists of real-world accessibility bug reports, each labeled with one of four priority levels. The distribution of labels is imbalanced, and label-aware preprocessing steps were taken to improve model performance.
| Label | Priority Level | Count |
|-------|----------------|-------|
| 1     | Critical       | 2035  |
| 2     | Major          | 1465  |
| 0     | Blocker        | 804   |
| 3     | Minor          | 756   |

**Total Samples**: 5,060

### 🧹 Preprocessing

- Text normalization and cleanup  
- Length filtering based on token count  
- Label frequency normalization for class-weighted loss  

To address class imbalance, class weights were computed as inverse label frequency and used in the cross-entropy loss during training.

## πŸ§ͺ Dataset Description

The dataset consists of short bug report texts labeled with one of four priority levels:

| Label | Meaning     |
|-------|-------------|
| 0     | Blocker     |
| 1     | Critical    |
| 2     | Major       |
| 3     | Minor       |

### ✏️ Sample Entries:

```csv
Text,Label
"mac voiceover screen reader",3
"Firefox crashes when interacting with some MathML content using Voiceover on Mac",0
"VoiceOver skips over text in paragraphs which contain <strong> or <em> tags",2
```


## πŸ“Š Model Comparison

We fine-tuned and evaluated three transformer models under identical training conditions using PyTorch Lightning (multi-GPU, mixed precision, and weighted loss). The validation accuracy and F1 scores are as follows:

| Model           | Base Architecture          | Validation Accuracy | Weighted F1 Score |
|-----------------|----------------------------|---------------------|-------------------|
| DeBERTa-v3 Base | microsoft/deberta-v3-base  | **69%**             | **0.69**          |
| ALBERT Base     | albert-base-v2             | 68%                 | 0.68              |
| RoBERTa Base    | roberta-base               | 66%                 | 0.67              |

### πŸ“ Observations

- **DeBERTa** delivered the best performance, likely due to its *disentangled attention* and *enhanced positional encoding*.
- **ALBERT** performed surprisingly well despite having fewer parameters, showcasing its efficiency.
- **RoBERTa** provided stable and reliable results but slightly underperformed compared to the others.


# RoBERTa Base Model for Accessibility Priority Classification

This model fine-tunes `roberta-base` using a 4-class custom dataset to classify accessibility issues by priority. It was trained using PyTorch Lightning and optimized with mixed precision on multiple GPUs.

## Details

- **Model**: roberta-base
- **Framework**: PyTorch Lightning
- **Labels**: 0 (Blocker), 1 (Critical), 2 (Major), 3 (Minor)
- **Validation F1**: 0.71 (weighted)

## Usage

```python
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch

model = RobertaForSequenceClassification.from_pretrained("shivamjadhav/roberta-priority-multiclass")
tokenizer = RobertaTokenizer.from_pretrained("shivamjadhav/roberta-priority-multiclass")

inputs = tokenizer("VoiceOver skips over text with <strong> tags", return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()

print("Predicted Priority:", prediction)
```