Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,16 @@ tags:
|
|
19 |
|
20 |
The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
23 |
|---------------------------|------------|------------------------------------------------|-------------------|
|
24 |
| `.gitattributes` | 1.52 kB | Configures LFS tracking for specific model files. | Initial commit |
|
|
|
19 |
|
20 |
The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
|
21 |
|
22 |
+
#### Key Enhancements:
|
23 |
+
|
24 |
+
* **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
|
25 |
+
|
26 |
+
* **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
|
27 |
+
|
28 |
+
* **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
|
29 |
+
|
30 |
+
* **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
|
31 |
+
|
32 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
33 |
|---------------------------|------------|------------------------------------------------|-------------------|
|
34 |
| `.gitattributes` | 1.52 kB | Configures LFS tracking for specific model files. | Initial commit |
|