BytedanceDouyinContent
/

SAIL-VL-8B

Model card Files Files and versions Community

zijian.kang commited on Feb 19

Commit

e528b05

·

1 Parent(s): 979314b

update readme

Files changed (1) hide show

README.md +3 -15

README.md CHANGED Viewed

@@ -40,12 +40,11 @@ Sail-VL benefits from high-quality data and carefully curated training recipes.
 ## Evaluation
-SAIL-VL not only outperforms the Qwen2-VL and
- series of models of comparable size, but is also competitive compared with recently released SoTAs.
 ### Detail Evaluations:
-| Benchmark | SAIL-VL-8B | Qwen2-VL-8B | InternVL2.5-MPO-8B | DeepSeekVL-2-Small |
 | --- | --- | --- | --- | --- |
 | **Overall Performance** |  *74.5* | *73.0* | *74.3* | *72.7* |
 | **General VQA** | *68.3* | *68.5* | *71.2* | *66.8* |
@@ -224,16 +223,6 @@ Our model is built upon numerous outstanding open-source projects, and we are gr
 ## Citation
 ```
-@misc{
-    sailvl,
-    title = {SAIL-VL: Scalable Vision Language Model Training with High Quality Data Curation},
-    url = {https://huggingface.co/BytedanceDouyinContent/SAIL-VL-8B/},
-    author = {Bytedance Douyin Content Team},
-    month = {December},
-    year = {2024}
-}
-```
-```
 @article{dong2025scalable,
   title={Scalable vision language model training via high quality data curation},
   author={Dong, Hongyuan and Kang, Zijian and Yin, Weijie and Liang, Xiao and Feng, Chao and Ran, Jiao},
@@ -241,11 +230,10 @@ Our model is built upon numerous outstanding open-source projects, and we are gr
   year={2025}
 }
 ```
 ## Contributions
 This work is conducted by Bytedance Douyin Content Team, authored by:
 ```
-{Hongyuan Dong, Zijian Kang, Weijie Yin}, Xiao Liang, Feng Chen, Jiao Ran
 {*} Equal Contributions.
 ```

 ## Evaluation
+SAIL-VL is competitive compared with Qwen2-VL, DeepSeekVL-2 and recently released InternVL2.5-MPO, please see the following table for details.
 ### Detail Evaluations:
+| Benchmark | **SAIL-VL-8B** | Qwen2-VL-8B | InternVL2.5-MPO-8B | DeepSeekVL-2-Small |
 | --- | --- | --- | --- | --- |
 | **Overall Performance** |  *74.5* | *73.0* | *74.3* | *72.7* |
 | **General VQA** | *68.3* | *68.5* | *71.2* | *66.8* |
 ## Citation
 ```
 @article{dong2025scalable,
   title={Scalable vision language model training via high quality data curation},
   author={Dong, Hongyuan and Kang, Zijian and Yin, Weijie and Liang, Xiao and Feng, Chao and Ran, Jiao},
   year={2025}
 }
 ```
 ## Contributions
 This work is conducted by Bytedance Douyin Content Team, authored by:
 ```
+{Hongyuan Dong, Zijian Kang, Weijie Yin}, Xiao Liang, Chao Feng, Jiao Ran
 {*} Equal Contributions.
 ```