File size: 4,018 Bytes
2961276 81ff014 fde8dd0 09d5726 f736b3d 09d5726 fde8dd0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
- facebook/dinov2-small
pipeline_tag: visual-question-answering
---
Pretrain stage only, 4630 epochs
# Introduction
We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model.
The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).
For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.
[POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope):
| Category | # Samples | TP | FP | TN | FN | Accuracy | Precision | Recall | F1 Score | Yes Ratio |
|-----------------|---------------|--------|--------|--------|--------|--------------|---------------|------------|--------------|---------------|
| Adversarial | 3000 | 1312 | 1250 | 250 | 188 | 0.521 | 0.512 | 0.875 | 0.646 | 0.854 |
| Popular | 3000 | 1312 | 1236 | 264 | 188 | 0.525 | 0.515 | 0.875 | 0.648 | 0.849 |
| Random | 2910 | 1312 | 1185 | 225 | 188 | 0.528 | 0.525 | 0.875 | 0.656 | 0.858 |
[TEXTVQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#textvqa)
Samples 5000, Accuracy 0% (:-|)
[SCIENCEQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#scienceqa)
Samples 4241, Correct: -, Accuracy: -%, IMG-Accuracy: -%
[MMMU](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#mmmu)
| Category | # Samples | Accuracy |
|---------------------------------|-----------|----------|
| Overall | 900 | 0.280 |
| Overall-Art and Design | 120 | 0.208 |
| Art | 30 | 0.167 |
| Art Theory | 30 | 0.200 |
| Design | 30 | 0.367 |
| Music | 30 | 0.100 |
| Overall-Business | 150 | 0.213 |
| Accounting | 30 | 0.100 |
| Economics | 30 | 0.367 |
| Finance | 30 | 0.200 |
| Management | 30 | 0.233 |
| Marketing | 30 | 0.167 |
| Overall-Science | 150 | 0.300 |
| Biology | 30 | 0.300 |
| Chemistry | 30 | 0.133 |
| Geography | 30 | 0.300 |
| Math | 30 | 0.333 |
| Physics | 30 | 0.433 |
| Overall-Health and Medicine | 150 | 0.340 |
| Basic Medical Science | 30 | 0.300 |
| Clinical Medicine | 30 | 0.133 |
| Diagnostics and Laboratory Med. | 30 | 0.333 |
| Pharmacy | 30 | 0.400 |
| Public Health | 30 | 0.533 |
| Overall-Humanities and Soc. Sci.| 120 | 0.342 |
| History | 30 | 0.300 |
| Literature | 30 | 0.567 |
| Sociology | 30 | 0.233 |
| Psychology | 30 | 0.267 |
| Overall-Tech and Engineering | 210 | 0.276 |
| Agriculture | 30 | 0.300 |
| Architecture and Engineering | 30 | 0.200 |
| Computer Science | 30 | 0.367 |
| Electronics | 30 | 0.200 |
| Energy and Power | 30 | 0.367 |
| Materials | 30 | 0.233 |
| Mechanical Engineering | 30 | 0.267 |
|