File size: 4,018 Bytes
2961276
 
 
 
 
 
 
 
81ff014
 
fde8dd0
 
09d5726
 
 
 
 
 
 
 
 
 
f736b3d
 
 
 
 
 
 
09d5726
 
 
 
 
 
 
 
fde8dd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
- facebook/dinov2-small
pipeline_tag: visual-question-answering
---

Pretrain stage only, 4630 epochs

# Introduction

We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model.

The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).

For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.

[POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope):

| Category    | # Samples | TP | FP | TN | FN | Accuracy | Precision | Recall | F1 Score | Yes Ratio |
|-----------------|---------------|--------|--------|--------|--------|--------------|---------------|------------|--------------|---------------|
| Adversarial     | 3000          | 1312   | 1250   | 250    | 188    | 0.521        | 0.512        | 0.875      | 0.646        | 0.854         |
| Popular         | 3000          | 1312   | 1236   | 264    | 188    | 0.525        | 0.515        | 0.875      | 0.648        | 0.849         |
| Random          | 2910          | 1312   | 1185   | 225    | 188    | 0.528        | 0.525        | 0.875      | 0.656        | 0.858         |


[TEXTVQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#textvqa)

Samples 5000, Accuracy 0% (:-|)

[SCIENCEQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#scienceqa)

Samples 4241, Correct: -, Accuracy: -%, IMG-Accuracy: -%

[MMMU](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#mmmu)

| Category                        | # Samples | Accuracy |
|---------------------------------|-----------|----------|
| Overall                         | 900       | 0.280    |
| Overall-Art and Design          | 120       | 0.208    |
| Art                             | 30        | 0.167    |
| Art Theory                      | 30        | 0.200    |
| Design                          | 30        | 0.367    |
| Music                           | 30        | 0.100    |
| Overall-Business                | 150       | 0.213    |
| Accounting                      | 30        | 0.100    |
| Economics                       | 30        | 0.367    |
| Finance                         | 30        | 0.200    |
| Management                      | 30        | 0.233    |
| Marketing                       | 30        | 0.167    |
| Overall-Science                 | 150       | 0.300    |
| Biology                         | 30        | 0.300    |
| Chemistry                       | 30        | 0.133    |
| Geography                       | 30        | 0.300    |
| Math                            | 30        | 0.333    |
| Physics                         | 30        | 0.433    |
| Overall-Health and Medicine     | 150       | 0.340    |
| Basic Medical Science           | 30        | 0.300    |
| Clinical Medicine               | 30        | 0.133    |
| Diagnostics and Laboratory Med. | 30        | 0.333    |
| Pharmacy                        | 30        | 0.400    |
| Public Health                   | 30        | 0.533    |
| Overall-Humanities and Soc. Sci.| 120       | 0.342    |
| History                         | 30        | 0.300    |
| Literature                      | 30        | 0.567    |
| Sociology                       | 30        | 0.233    |
| Psychology                      | 30        | 0.267    |
| Overall-Tech and Engineering    | 210       | 0.276    |
| Agriculture                     | 30        | 0.300    |
| Architecture and Engineering    | 30        | 0.200    |
| Computer Science                | 30        | 0.367    |
| Electronics                     | 30        | 0.200    |
| Energy and Power                | 30        | 0.367    |
| Materials                       | 30        | 0.233    |
| Mechanical Engineering          | 30        | 0.267    |