Jaime-Choi commited on
Commit
c8eae73
·
verified ·
1 Parent(s): 509f146

Update README.md

Browse files

![together.png](https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png)

![visual_abstract_bk.png](https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png)

Files changed (1) hide show
  1. README.md +45 -1
README.md CHANGED
@@ -7,7 +7,51 @@ sdk: static
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
10
  Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
11
  In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
12
  We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
13
- Check out our preprint [coming soon], and feel free to contact us at [email protected].
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
11
+ ==============================================================================================================
12
+ Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
13
+
14
+ ## 📄 [Paper](URL)     💻 [Code](URL)
15
+
16
  Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
17
  In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
18
  We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
19
+ Check out our preprint [coming soon], and feel free to contact us at [email protected].
20
+
21
+ ---------------------------------------
22
+
23
+ ## Abstract
24
+ > Often, the needs and visual abilities differ between the annotator group and the end user
25
+ group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain.
26
+ Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat
27
+ lacking by BLV standards. In this study, we ask sighted individuals to assess—rather than produce—diagram descriptions generated by vision-language models (VLM) that have been
28
+ guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators
29
+ who are themselves BLV and teach visually impaired learners. We release SIGHTATION, a collection of diagram description datasets
30
+ spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and
31
+ demonstrate their fine-tuning potential in various downstream tasks.
32
+
33
+ ## Sightation Collection
34
+ - SightationCompletions
35
+ - SightationPreference
36
+ - SightationRetrieval
37
+ - SightationVQA
38
+ - SightationReasoning
39
+
40
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png" width="70%" height="70%" title="visual_abstract" alt="visual_abstract"></img>
41
+ The key benefit of utilizing sighted user feedback lies in their assessments that are based on solid visual
42
+ grounding. The compiled assessments prove an effective training substance for steering VLMs towards more
43
+ accessible descriptions.
44
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/8oYvtq7dtv_Ck-U6OlcAE.png" width="50%" height="50%" title="dimensions_assignment" alt="dimensions_assignment"></img>
45
+ The description qualities assessed by their respective evaluator groups.
46
+
47
+ ## Results
48
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png" width="60%" height="60%" title="spider_chart" alt="spider_chart"></img>
49
+ Tuning VLMs on Sightation enhanced various qualities of the diagram descriptions, evaluated by BLV educators, and shown here as normalized ratings averaged in each aspect.
50
+ The capability of the dataset is most strongly pronounced with Qwen2-VL-2B model, shown above.
51
+
52
+ ## BibTeX
53
+ If you find this work useful for your research, please cite:
54
+ ```bash
55
+ @inproceedings{
56
+ }
57
+ ```