Quinn777
/

AtomThink-LLaVA-Llama3-8B

Safetensors

llava_llama

Model card Files Files and versions Community

Quinn777 commited on Dec 16, 2024

Commit

87e9936

verified ·

1 Parent(s): 51fd4a0

Update README.md

Browse files

Files changed (1) hide show

README.md +48 -3

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
----
-license: apache-2.0
----

+license: apache-2.0
+---
+# Model Card for AtomThink-LLaVA-Llama3-8B
+The model is post-trained based on LLaVA-Llama3-8B and the AtomThink framework, and can be used to solve complex multimodal mathematical problems.
+# Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:
+| **Model**             | **Inference** | **General** | **Math** | **Total** | **TL**   | **TD**   | **VI**   | **VD**   | **VO**   | **Total** |
+|-----------------------|---------------|-------------|----------|-----------|----------|----------|----------|----------|----------|-----------|
+| Random Choice         | -             | -           | -        | 17.9      | 12.4     | 12.4     | 12.4     | 12.4     | 12.4     | 12.4      |
+| Human                 | -             | -           | -        | -         | 70.9     | 71.2     | 61.4     | 68.3     | 66.7     | 66.7      |
+| OpenAI o1             | Slow Think*   | -           | -        | 73.9      | -        | -        | -        | -        | -        | -         |
+| GPT-4o                | CoT           | -           | -        | 63.8      | -        | -        | -        | -        | -        | -         |
+| GPT-4V                | CoT           | -           | -        | 49.9      | 56.6     | 63.1     | 51.4     | 50.8     | 50.3     | 54.4      |
+| LLaVA-NeXT-34B        | Direct        | -           | -        | 46.5      | 25.5     | 33.8     | 23.5     | 20.3     | 15.7     | 23.8      |
+| InternLM-XComposer2   | Direct        | -           | -        | 57.6      | 17.0     | 22.3     | 15.7     | 16.4     | 11.0     | 16.5      |
+| Qwen-VL-Plus          | Direct        | -           | -        | 43.3      | 11.1     | 15.7     | 9.0      | 13.0     | 10.0     | 11.8      |
+| LLaVA-1.5-13B         | Direct        | -           | -        | 27.6      | 15.2     | 19.4     | 16.8     | 15.2     | 11.3     | 15.6      |
+| G-LLaVA-7B            | Direct        | -           | -        | 53.4      | 20.7     | 20.9     | 17.2     | 14.6     | 9.4      | 16.6      |
+| MAVIS-7B              | Direct        | -           | -        | -         | 29.1     | 41.4     | 27.4     | 24.9     | 14.6     | 27.5      |
+| LLaVA-Llama3-8B       | Direct        | 34.1        | 25.6     | 29.5      | 16.0     | 19.3     | 16.4     | 13.1     | 15.0     | 15.9      |
+| LLaVA w/. Formatted   | CoT           | 30.2        | 22.9     | 26.3      | 14.3     | 18.4     | 15.7     | 10.0     | 7.7      | 13.2      |
+| AtomThink-LLaVA       | Direct        | 34.4        | 27.2     | 30.5      | 16.0     | 19.3     | 16.2     | 13.1     | 15.0     | 15.9      |
+| AtomThink-LLaVA       | Quick Think   | **36.9**    | **37.0** | **36.6**  | **22.2** | **26.6** | **24.1** | **20.9** | **17.9** | **22.4**  |
+| AtomThink-LLaVA       | Slow Think    | **36.5**    | **41.3** | **39.1**  | **36.1** | **42.4** | **30.0** | **36.8** | **28.6** | **34.7**  |
+# Citation
+If you use this dataset in your research, please cite:
+```text
+@article{xiang2024atomthink,
+  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
+  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
+  journal={arXiv preprint arXiv:2411.11930},
+  year={2024}
+}
+@article{liu2024visual,
+  title={Visual instruction tuning},
+  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
+  journal={Advances in neural information processing systems},
+  volume={36},
+  year={2024}
+}
+```
+# License
+The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this checkpoint.