Spaces:

Sightation
/

README

Running

App Files Files Community

Jaime-Choi commited on Mar 13

Commit

c8eae73

verified ·

1 Parent(s): 509f146

Update README.md

Browse files

![together.png](https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png)

![visual_abstract_bk.png](https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png)

Files changed (1) hide show

README.md +45 -1

README.md CHANGED Viewed

@@ -7,7 +7,51 @@ sdk: static
 pinned: false
 ---
 Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
 In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
 We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
-Check out our preprint [coming soon], and feel free to contact us at [email protected].

 pinned: false
 ---
+Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
+==============================================================================================================
+Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
+## 📄 [Paper](URL) &nbsp;&nbsp;&nbsp; 💻 [Code](URL)
 Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
 In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
 We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
+Check out our preprint [coming soon], and feel free to contact us at [email protected].
+---------------------------------------
+## Abstract
+> Often, the needs and visual abilities differ between the annotator group and the end user
+group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain.
+Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat
+lacking by BLV standards. In this study, we ask sighted individuals to assess—rather than produce—diagram descriptions generated by vision-language models (VLM) that have been
+guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators
+who are themselves BLV and teach visually impaired learners. We release SIGHTATION, a collection of diagram description datasets
+spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and
+demonstrate their fine-tuning potential in various downstream tasks.
+## Sightation Collection
+- SightationCompletions
+- SightationPreference
+- SightationRetrieval
+- SightationVQA
+- SightationReasoning
+<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png" width="70%" height="70%" title="visual_abstract" alt="visual_abstract"></img>
+The key benefit of utilizing sighted user feedback lies in their assessments that are based on solid visual
+grounding. The compiled assessments prove an effective training substance for steering VLMs towards more
+accessible descriptions.
+<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/8oYvtq7dtv_Ck-U6OlcAE.png" width="50%" height="50%" title="dimensions_assignment" alt="dimensions_assignment"></img>
+The description qualities assessed by their respective evaluator groups.
+## Results
+<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png" width="60%" height="60%" title="spider_chart" alt="spider_chart"></img>
+Tuning VLMs on Sightation enhanced various qualities of the diagram descriptions, evaluated by BLV educators, and shown here as normalized ratings averaged in each aspect.
+The capability of the dataset is most strongly pronounced with Qwen2-VL-2B model, shown above.
+## BibTeX
+If you find this work useful for your research, please cite:
+```bash
+@inproceedings{
+}
+```