zijian.kang
commited on
Commit
·
e528b05
1
Parent(s):
979314b
update readme
Browse files
README.md
CHANGED
@@ -40,12 +40,11 @@ Sail-VL benefits from high-quality data and carefully curated training recipes.
|
|
40 |
|
41 |
## Evaluation
|
42 |
|
43 |
-
SAIL-VL
|
44 |
-
series of models of comparable size, but is also competitive compared with recently released SoTAs.
|
45 |
|
46 |
### Detail Evaluations:
|
47 |
|
48 |
-
| Benchmark | SAIL-VL-8B | Qwen2-VL-8B | InternVL2.5-MPO-8B | DeepSeekVL-2-Small |
|
49 |
| --- | --- | --- | --- | --- |
|
50 |
| **Overall Performance** | *74.5* | *73.0* | *74.3* | *72.7* |
|
51 |
| **General VQA** | *68.3* | *68.5* | *71.2* | *66.8* |
|
@@ -224,16 +223,6 @@ Our model is built upon numerous outstanding open-source projects, and we are gr
|
|
224 |
|
225 |
## Citation
|
226 |
```
|
227 |
-
@misc{
|
228 |
-
sailvl,
|
229 |
-
title = {SAIL-VL: Scalable Vision Language Model Training with High Quality Data Curation},
|
230 |
-
url = {https://huggingface.co/BytedanceDouyinContent/SAIL-VL-8B/},
|
231 |
-
author = {Bytedance Douyin Content Team},
|
232 |
-
month = {December},
|
233 |
-
year = {2024}
|
234 |
-
}
|
235 |
-
```
|
236 |
-
```
|
237 |
@article{dong2025scalable,
|
238 |
title={Scalable vision language model training via high quality data curation},
|
239 |
author={Dong, Hongyuan and Kang, Zijian and Yin, Weijie and Liang, Xiao and Feng, Chao and Ran, Jiao},
|
@@ -241,11 +230,10 @@ Our model is built upon numerous outstanding open-source projects, and we are gr
|
|
241 |
year={2025}
|
242 |
}
|
243 |
```
|
244 |
-
|
245 |
## Contributions
|
246 |
This work is conducted by Bytedance Douyin Content Team, authored by:
|
247 |
```
|
248 |
-
{Hongyuan Dong, Zijian Kang, Weijie Yin}, Xiao Liang, Feng
|
249 |
|
250 |
{*} Equal Contributions.
|
251 |
```
|
|
|
40 |
|
41 |
## Evaluation
|
42 |
|
43 |
+
SAIL-VL is competitive compared with Qwen2-VL, DeepSeekVL-2 and recently released InternVL2.5-MPO, please see the following table for details.
|
|
|
44 |
|
45 |
### Detail Evaluations:
|
46 |
|
47 |
+
| Benchmark | **SAIL-VL-8B** | Qwen2-VL-8B | InternVL2.5-MPO-8B | DeepSeekVL-2-Small |
|
48 |
| --- | --- | --- | --- | --- |
|
49 |
| **Overall Performance** | *74.5* | *73.0* | *74.3* | *72.7* |
|
50 |
| **General VQA** | *68.3* | *68.5* | *71.2* | *66.8* |
|
|
|
223 |
|
224 |
## Citation
|
225 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
@article{dong2025scalable,
|
227 |
title={Scalable vision language model training via high quality data curation},
|
228 |
author={Dong, Hongyuan and Kang, Zijian and Yin, Weijie and Liang, Xiao and Feng, Chao and Ran, Jiao},
|
|
|
230 |
year={2025}
|
231 |
}
|
232 |
```
|
|
|
233 |
## Contributions
|
234 |
This work is conducted by Bytedance Douyin Content Team, authored by:
|
235 |
```
|
236 |
+
{Hongyuan Dong, Zijian Kang, Weijie Yin}, Xiao Liang, Chao Feng, Jiao Ran
|
237 |
|
238 |
{*} Equal Contributions.
|
239 |
```
|