Spaces:
Running
Running
Update README.md
Browse files

README.md
CHANGED
@@ -7,7 +7,51 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
|
11 |
In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
|
12 |
We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
|
13 |
-
Check out our preprint [coming soon], and feel free to contact us at [email protected].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
|
11 |
+
==============================================================================================================
|
12 |
+
Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
|
13 |
+
|
14 |
+
## 📄 [Paper](URL) 💻 [Code](URL)
|
15 |
+
|
16 |
Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
|
17 |
In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
|
18 |
We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
|
19 |
+
Check out our preprint [coming soon], and feel free to contact us at [email protected].
|
20 |
+
|
21 |
+
---------------------------------------
|
22 |
+
|
23 |
+
## Abstract
|
24 |
+
> Often, the needs and visual abilities differ between the annotator group and the end user
|
25 |
+
group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain.
|
26 |
+
Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat
|
27 |
+
lacking by BLV standards. In this study, we ask sighted individuals to assess—rather than produce—diagram descriptions generated by vision-language models (VLM) that have been
|
28 |
+
guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators
|
29 |
+
who are themselves BLV and teach visually impaired learners. We release SIGHTATION, a collection of diagram description datasets
|
30 |
+
spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and
|
31 |
+
demonstrate their fine-tuning potential in various downstream tasks.
|
32 |
+
|
33 |
+
## Sightation Collection
|
34 |
+
- SightationCompletions
|
35 |
+
- SightationPreference
|
36 |
+
- SightationRetrieval
|
37 |
+
- SightationVQA
|
38 |
+
- SightationReasoning
|
39 |
+
|
40 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png" width="70%" height="70%" title="visual_abstract" alt="visual_abstract"></img>
|
41 |
+
The key benefit of utilizing sighted user feedback lies in their assessments that are based on solid visual
|
42 |
+
grounding. The compiled assessments prove an effective training substance for steering VLMs towards more
|
43 |
+
accessible descriptions.
|
44 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/8oYvtq7dtv_Ck-U6OlcAE.png" width="50%" height="50%" title="dimensions_assignment" alt="dimensions_assignment"></img>
|
45 |
+
The description qualities assessed by their respective evaluator groups.
|
46 |
+
|
47 |
+
## Results
|
48 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png" width="60%" height="60%" title="spider_chart" alt="spider_chart"></img>
|
49 |
+
Tuning VLMs on Sightation enhanced various qualities of the diagram descriptions, evaluated by BLV educators, and shown here as normalized ratings averaged in each aspect.
|
50 |
+
The capability of the dataset is most strongly pronounced with Qwen2-VL-2B model, shown above.
|
51 |
+
|
52 |
+
## BibTeX
|
53 |
+
If you find this work useful for your research, please cite:
|
54 |
+
```bash
|
55 |
+
@inproceedings{
|
56 |
+
}
|
57 |
+
```
|