shawn0wang commited on
Commit
0aede5a
·
verified ·
1 Parent(s): e6a375c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -13
README.md CHANGED
@@ -10,9 +10,11 @@ pipeline_tag: image-text-to-text
10
  ## 🌐 [Homepage](#) | 📖 [Technical Report](https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf) | 💻 [GitHub](https://github.com/SkyworkAI/Skywork-R1V)
11
  ---
12
 
13
- ## 1. Introduction
14
-
15
- We introduce Skywork-R1V, a multimodal reasoning model that extends the R1-series text models to visual modalities through a near-lossless transfer method. Using a lightweight visual projector, Skywork-R1V enables seamless multimodal adaptation without requiring retraining of either the base language model or vision encoder. To enhance visual-text alignment, we developed a hybrid optimization strategy combining Iterative Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), significantly improving cross-modal integration. Additionally, we created an adaptive-length Chain-of-Thought distillation approach for generating reasoning data, which dynamically optimizes reasoning chain lengths to improve inference efficiency and prevent overthinking. The model achieves good performance on key multimodal reasoning benchmarks, scoring 69 on MMMU and 67.5 on MathVista, comparable to leading closed-source models like Gemini 2.0 and Kimi-k1.5. It also maintains strong textual reasoning capabilities, achieving impressive scores of 72.0 on AIME and 94.0 on MATH500.
 
 
16
 
17
 
18
  ## 2. Feature
@@ -232,18 +234,10 @@ We introduce Skywork-R1V, a multimodal reasoning model that extends the R1-serie
232
  <img src="eval.jpeg" width="80%" alt="skywork_r1v_eval" />
233
  </div>
234
 
235
-
236
- ## 4. Skywork-R1V Family
237
-
238
- | Model Name | Vision Encoder | Language Model | HF Link |
239
- | ---------------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ------------ |
240
- | Skywork-R1V-38B | [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5) | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | [🤗 Link](#) |
241
- | Skywork-R1V-38B-qwq | [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5) | [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | - |
242
-
243
  ---
244
 
245
 
246
- ## 5. Usage
247
 
248
  ```python
249
  import math
@@ -375,7 +369,7 @@ print(f'User: {question}\nAssistant: {response}')
375
 
376
  ---
377
 
378
- ## 6. Citation
379
  If you use Skywork-R1V in your research, please cite:
380
 
381
  ```
 
10
  ## 🌐 [Homepage](#) | 📖 [Technical Report](https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf) | 💻 [GitHub](https://github.com/SkyworkAI/Skywork-R1V)
11
  ---
12
 
13
+ ## 1. Model Introduction
14
+ | Model Name | Vision Encoder | Language Model | HF Link |
15
+ | ---------------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ------------ |
16
+ | Skywork-R1V-38B | [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5) | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | [🤗 Link](#) |
17
+ | Skywork-R1V-38B-qwq | [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5) | [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | - |
18
 
19
 
20
  ## 2. Feature
 
234
  <img src="eval.jpeg" width="80%" alt="skywork_r1v_eval" />
235
  </div>
236
 
 
 
 
 
 
 
 
 
237
  ---
238
 
239
 
240
+ ## 4. Usage
241
 
242
  ```python
243
  import math
 
369
 
370
  ---
371
 
372
+ ## 5. Citation
373
  If you use Skywork-R1V in your research, please cite:
374
 
375
  ```