Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,8 @@ tags:
|
|
16 |
---
|
17 |

|
18 |
|
|
|
|
|
19 |
Blazer.1-2B-Vision `4-bit precision` is based on the Qwen2-VL model, fine-tuned for raw document annotation extraction, optical character recognition (OCR), and solving math problems with LaTeX formatting. This model integrates a conversational approach with advanced visual and textual understanding to effectively handle multi-modal tasks. Key enhancements include state-of-the-art (SoTA) performance in understanding images of various resolutions and aspect ratios, as demonstrated by its success on visual understanding benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA. Additionally, it excels in video comprehension, capable of processing videos over 20 minutes in length for high-quality video-based question answering, dialogue, and content creation. Blazer.1-2B-Vision also functions as an intelligent agent capable of operating devices like mobile phones and robots, thanks to its complex reasoning and decision-making abilities, enabling automatic operations based on visual environments and text instructions. To serve global users, the model offers multilingual support, understanding texts in a wide range of languages, including English, Chinese, most European languages, Japanese, Korean, Arabic, and Vietnamese.
|
20 |
|
21 |
# **Use it With Transformer**
|
|
|
16 |
---
|
17 |

|
18 |
|
19 |
+
# **Blazer.1-2B-Vision**
|
20 |
+
|
21 |
Blazer.1-2B-Vision `4-bit precision` is based on the Qwen2-VL model, fine-tuned for raw document annotation extraction, optical character recognition (OCR), and solving math problems with LaTeX formatting. This model integrates a conversational approach with advanced visual and textual understanding to effectively handle multi-modal tasks. Key enhancements include state-of-the-art (SoTA) performance in understanding images of various resolutions and aspect ratios, as demonstrated by its success on visual understanding benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA. Additionally, it excels in video comprehension, capable of processing videos over 20 minutes in length for high-quality video-based question answering, dialogue, and content creation. Blazer.1-2B-Vision also functions as an intelligent agent capable of operating devices like mobile phones and robots, thanks to its complex reasoning and decision-making abilities, enabling automatic operations based on visual environments and text instructions. To serve global users, the model offers multilingual support, understanding texts in a wide range of languages, including English, Chinese, most European languages, Japanese, Korean, Arabic, and Vietnamese.
|
22 |
|
23 |
# **Use it With Transformer**
|