can-gaa-hou's picture
Add model
d5db947
metadata
license: apache-2.0
language:
  - multilingual
pipeline_tag: image-text-to-text
tags:
  - got
  - vision-language
  - ocr2.0
  - custom_code
base_model:
  - stepfun-ai/GOT-OCR2_0
base_model_relation: quantized

This is the OpenVINO accelerated version for GOT-OCR2.0. To use this model, download all files from the origin repo stepfun-ai/GOT-OCR2_0 and copy everything to the weight folder. The file structure should look like this:

.
β”‚  app.py
β”‚  convert_model.py
β”œβ”€weight
β”‚      config.json
β”‚      generation_config.json
β”‚      got_vision_b.py
β”‚      modeling_GOT.py
β”‚      openvino_language_model.bin
β”‚      openvino_language_model.xml
β”‚      openvino_text_embeddings_model.bin
β”‚      openvino_text_embeddings_model.xml
β”‚      openvino_vision_embeddings_merger_model.bin
β”‚      openvino_vision_embeddings_merger_model.xml
β”‚      openvino_vision_embeddings_model.bin
β”‚      openvino_vision_embeddings_model.xml
β”‚      qwen.tiktoken
β”‚      render_tools.py
β”‚      special_tokens_map.json
β”‚      tokenization_qwen.json
β”‚      tokenizer_config.json

Libraries require:

pip install "openvino" "torch" "transformers" "torchvision" "Pillow" "nncf" "requests" "numpy"

Simply running the following command

python app.py --image-file /path/to/image

For more instruction, refer to GitHub Page