|
--- |
|
title: Bug Priority Multiclass |
|
emoji: π» |
|
colorFrom: red |
|
colorTo: gray |
|
sdk: docker |
|
pinned: false |
|
short_description: This is a Multiclass Bug Priority Model |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
tags: |
|
- text-classification |
|
- accessibility |
|
- bug-triage |
|
- transformers |
|
- roberta |
|
- pytorch-lightning |
|
license: apache-2.0 |
|
datasets: |
|
- custom |
|
language: |
|
- en |
|
|
|
# RoBERTa Base Model for Accessibility Bug Priority Classification |
|
|
|
This model fine-tunes `roberta-base` using a labeled dataset of accessibility-related bug descriptions to automatically classify their **priority level**. It helps automate the triage of bugs affecting users of screen readers and other assistive technologies. |
|
|
|
|
|
## π§ Problem Statement |
|
|
|
Modern applications often suffer from accessibility issues that impact users with disabilities, such as content not being read properly by screen readers like **VoiceOver**, **NVDA**, or **JAWS**. These bugs are often reported via issue trackers or user forums in the form of short text summaries. |
|
|
|
Due to the unstructured and domain-specific nature of these reports, manual triage is: |
|
- Time-consuming |
|
- Inconsistent |
|
- Often delayed in resolution |
|
|
|
There is a critical need to **prioritize accessibility bugs quickly and accurately** to ensure inclusive user experiences. |
|
|
|
|
|
## π― Research Objective |
|
|
|
This research project builds a machine learning model that can **automatically assign a priority level** to an accessibility bug report. The goal is to: |
|
|
|
- Streamline accessibility QA workflows |
|
- Accelerate high-impact fixes |
|
- Empower developers and testers with ML-assisted tooling |
|
|
|
## π Dataset Statistics |
|
|
|
The dataset used for training consists of real-world accessibility bug reports, each labeled with one of four priority levels. The distribution of labels is imbalanced, and label-aware preprocessing steps were taken to improve model performance. |
|
| Label | Priority Level | Count | |
|
|-------|----------------|-------| |
|
| 1 | Critical | 2035 | |
|
| 2 | Major | 1465 | |
|
| 0 | Blocker | 804 | |
|
| 3 | Minor | 756 | |
|
|
|
**Total Samples**: 5,060 |
|
|
|
### π§Ή Preprocessing |
|
|
|
- Text normalization and cleanup |
|
- Length filtering based on token count |
|
- Label frequency normalization for class-weighted loss |
|
|
|
To address class imbalance, class weights were computed as inverse label frequency and used in the cross-entropy loss during training. |
|
|
|
## π§ͺ Dataset Description |
|
|
|
The dataset consists of short bug report texts labeled with one of four priority levels: |
|
|
|
| Label | Meaning | |
|
|-------|-------------| |
|
| 0 | Blocker | |
|
| 1 | Critical | |
|
| 2 | Major | |
|
| 3 | Minor | |
|
|
|
### βοΈ Sample Entries: |
|
|
|
```csv |
|
Text,Label |
|
"mac voiceover screen reader",3 |
|
"Firefox crashes when interacting with some MathML content using Voiceover on Mac",0 |
|
"VoiceOver skips over text in paragraphs which contain <strong> or <em> tags",2 |
|
``` |
|
|
|
|
|
## π Model Comparison |
|
|
|
We fine-tuned and evaluated three transformer models under identical training conditions using PyTorch Lightning (multi-GPU, mixed precision, and weighted loss). The validation accuracy and F1 scores are as follows: |
|
|
|
| Model | Base Architecture | Validation Accuracy | Weighted F1 Score | |
|
|-----------------|----------------------------|---------------------|-------------------| |
|
| DeBERTa-v3 Base | microsoft/deberta-v3-base | **69%** | **0.69** | |
|
| ALBERT Base | albert-base-v2 | 68% | 0.68 | |
|
| RoBERTa Base | roberta-base | 66% | 0.67 | |
|
|
|
### π Observations |
|
|
|
- **DeBERTa** delivered the best performance, likely due to its *disentangled attention* and *enhanced positional encoding*. |
|
- **ALBERT** performed surprisingly well despite having fewer parameters, showcasing its efficiency. |
|
- **RoBERTa** provided stable and reliable results but slightly underperformed compared to the others. |
|
|
|
|
|
# RoBERTa Base Model for Accessibility Priority Classification |
|
|
|
This model fine-tunes `roberta-base` using a 4-class custom dataset to classify accessibility issues by priority. It was trained using PyTorch Lightning and optimized with mixed precision on multiple GPUs. |
|
|
|
## Details |
|
|
|
- **Model**: roberta-base |
|
- **Framework**: PyTorch Lightning |
|
- **Labels**: 0 (Blocker), 1 (Critical), 2 (Major), 3 (Minor) |
|
- **Validation F1**: 0.71 (weighted) |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import RobertaTokenizer, RobertaForSequenceClassification |
|
import torch |
|
|
|
model = RobertaForSequenceClassification.from_pretrained("shivamjadhav/roberta-priority-multiclass") |
|
tokenizer = RobertaTokenizer.from_pretrained("shivamjadhav/roberta-priority-multiclass") |
|
|
|
inputs = tokenizer("VoiceOver skips over text with <strong> tags", return_tensors="pt") |
|
outputs = model(**inputs) |
|
prediction = torch.argmax(outputs.logits, dim=1).item() |
|
|
|
print("Predicted Priority:", prediction) |
|
``` |
|
|
|
|