metadata

title: Bug Priority Multiclass
emoji: 💻
colorFrom: red
colorTo: gray
sdk: docker
pinned: false
short_description: This is a Multiclass Bug Priority Model

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

tags:

text-classification
accessibility
bug-triage
transformers
roberta
pytorch-lightning license: apache-2.0 datasets:
custom language:
en

RoBERTa Base Model for Accessibility Bug Priority Classification

This model fine-tunes roberta-base using a labeled dataset of accessibility-related bug descriptions to automatically classify their priority level. It helps automate the triage of bugs affecting users of screen readers and other assistive technologies.

🧠 Problem Statement

Modern applications often suffer from accessibility issues that impact users with disabilities, such as content not being read properly by screen readers like VoiceOver, NVDA, or JAWS. These bugs are often reported via issue trackers or user forums in the form of short text summaries.

Due to the unstructured and domain-specific nature of these reports, manual triage is:

Time-consuming
Inconsistent
Often delayed in resolution

There is a critical need to prioritize accessibility bugs quickly and accurately to ensure inclusive user experiences.

🎯 Research Objective

This research project builds a machine learning model that can automatically assign a priority level to an accessibility bug report. The goal is to:

Streamline accessibility QA workflows
Accelerate high-impact fixes
Empower developers and testers with ML-assisted tooling

📊 Dataset Statistics

The dataset used for training consists of real-world accessibility bug reports, each labeled with one of four priority levels. The distribution of labels is imbalanced, and label-aware preprocessing steps were taken to improve model performance.

Label	Priority Level	Count
1	Critical	2035
2	Major	1465
0	Blocker	804
3	Minor	756

Total Samples: 5,060

🧹 Preprocessing

Text normalization and cleanup
Length filtering based on token count
Label frequency normalization for class-weighted loss

To address class imbalance, class weights were computed as inverse label frequency and used in the cross-entropy loss during training.

🧪 Dataset Description

The dataset consists of short bug report texts labeled with one of four priority levels:

Label	Meaning
0	Blocker
1	Critical
2	Major
3	Minor

✏️ Sample Entries:

Text,Label
"mac voiceover screen reader",3
"Firefox crashes when interacting with some MathML content using Voiceover on Mac",0
"VoiceOver skips over text in paragraphs which contain <strong> or <em> tags",2

📊 Model Comparison

We fine-tuned and evaluated three transformer models under identical training conditions using PyTorch Lightning (multi-GPU, mixed precision, and weighted loss). The validation accuracy and F1 scores are as follows:

Model	Base Architecture	Validation Accuracy	Weighted F1 Score
DeBERTa-v3 Base	microsoft/deberta-v3-base	69%	0.69
ALBERT Base	albert-base-v2	68%	0.68
RoBERTa Base	roberta-base	66%	0.67

📝 Observations

DeBERTa delivered the best performance, likely due to its disentangled attention and enhanced positional encoding.
ALBERT performed surprisingly well despite having fewer parameters, showcasing its efficiency.
RoBERTa provided stable and reliable results but slightly underperformed compared to the others.

RoBERTa Base Model for Accessibility Priority Classification

This model fine-tunes roberta-base using a 4-class custom dataset to classify accessibility issues by priority. It was trained using PyTorch Lightning and optimized with mixed precision on multiple GPUs.

Details

Model: roberta-base
Framework: PyTorch Lightning
Labels: 0 (Blocker), 1 (Critical), 2 (Major), 3 (Minor)
Validation F1: 0.71 (weighted)

Usage

from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch

model = RobertaForSequenceClassification.from_pretrained("shivamjadhav/roberta-priority-multiclass")
tokenizer = RobertaTokenizer.from_pretrained("shivamjadhav/roberta-priority-multiclass")

inputs = tokenizer("VoiceOver skips over text with <strong> tags", return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()

print("Predicted Priority:", prediction)