metadata
license: apache-2.0
language:
- multilingual
pipeline_tag: image-text-to-text
tags:
- got
- vision-language
- ocr2.0
- custom_code
base_model:
- stepfun-ai/GOT-OCR2_0
base_model_relation: quantized
This is the OpenVINO accelerated version for GOT-OCR2.0. To use this model, download all files from the origin repo stepfun-ai/GOT-OCR2_0 and copy everything to the weight folder. The file structure should look like this:
.
β app.py
β convert_model.py
ββweight
β config.json
β generation_config.json
β got_vision_b.py
β modeling_GOT.py
β openvino_language_model.bin
β openvino_language_model.xml
β openvino_text_embeddings_model.bin
β openvino_text_embeddings_model.xml
β openvino_vision_embeddings_merger_model.bin
β openvino_vision_embeddings_merger_model.xml
β openvino_vision_embeddings_model.bin
β openvino_vision_embeddings_model.xml
β qwen.tiktoken
β render_tools.py
β special_tokens_map.json
β tokenization_qwen.json
β tokenizer_config.json
Libraries require:
pip install "openvino" "torch" "transformers" "torchvision" "Pillow" "nncf" "requests" "numpy"
Simply running the following command
python app.py --image-file /path/to/image
For more instruction, refer to GitHub Page