Update README.md
Browse files
README.md
CHANGED
@@ -38,6 +38,11 @@ In the past five months since Qwen2-VL’s release, numerous developers have bui
|
|
38 |
|
39 |
We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.
|
40 |
|
|
|
|
|
|
|
|
|
|
|
41 |
* **Streamlined and Efficient Vision Encoder**
|
42 |
|
43 |
We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM.
|
|
|
38 |
|
39 |
We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.
|
40 |
|
41 |
+
<p align="center">
|
42 |
+
<img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-VL/qwen2.5vl_arc.jpeg" width="80%"/>
|
43 |
+
<p>
|
44 |
+
|
45 |
+
|
46 |
* **Streamlined and Efficient Vision Encoder**
|
47 |
|
48 |
We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM.
|