zhibinlan commited on
Commit
d0139cd
·
verified ·
1 Parent(s): b616de3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -20,6 +20,19 @@ The LLaVE models are 2B parameter multimodal embedding models based on the Aquil
20
 
21
  The model have the ability to embed with texts, images, multi-image and videos.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ### Quick Start
24
 
25
  First clone our github
 
20
 
21
  The model have the ability to embed with texts, images, multi-image and videos.
22
 
23
+ ## MMEB Leaderboard
24
+ We achieved the top ranking on the MMEB leaderboard using only a small amount of data.
25
+
26
+ ![MMEB Leaderboard](./figures/leaderboard.png)
27
+
28
+
29
+ ## Model Performance
30
+ LLaVE-7B achieved the SOTA performance on MMEB using only 662K training pairs.
31
+ ![MMEB](./figures/results.png)
32
+
33
+ Although LLaVE is trained on image-text data, it can generalize to text-video retrieval tasks in a zero-shot manner and achieve strong performance, demonstrating its remarkable potential for transfer to other embedding tasks.
34
+ <img src="./figures/zero-shot-vr.png" alt="video-retrieve" width="400" height="auto">
35
+
36
  ### Quick Start
37
 
38
  First clone our github