Inst-IT
/

LLaVA-Next-Inst-It-Qwen2-7B

Video-Text-to-Text

instance-understanding

Model card Files Files and versions Community

wjpoom commited on Feb 17

Commit

40e0faa

·

verified ·

1 Parent(s): 9554453

Update README.md

Files changed (1) hide show

README.md +14 -8

README.md CHANGED Viewed

@@ -175,20 +175,26 @@ Our code is based on LLaVA-NeXT, before running, please install the LLaVA-NeXT t
 ```shell
 pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git
 ```
 **Load Model**
 ```python
 from llava.model.builder import load_pretrained_model
-from llava.constants import (
-    DEFAULT_IM_END_TOKEN,
-    DEFAULT_IM_START_TOKEN,
-    DEFAULT_IMAGE_TOKEN,
-    IGNORE_INDEX,
-    IMAGE_TOKEN_INDEX,
-)
 from llava.mm_utils import (
     KeywordsStoppingCriteria,
     get_model_name_from_path,
-    tokenizer_image_token
 )
 from llava.conversation import SeparatorStyle, conv_templates
 from llava.eval.model_vqa import preprocess_qwen

 ```shell
 pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git
 ```
+**Error Handling**
+You might encounter an error when loading checkpoint from the local disk:
+```shell
+RuntimeError: Error(s) in loading state_dict for CLIPVisionModel:
+	size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152]) from checkpoint, the shape in current model is torch.Size([730, 1152]).
+	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
+```
+If you meet this error, you can fix this error following the guidelines in [this issue](https://github.com/inst-it/inst-it/issues/3).
 **Load Model**
 ```python
 from llava.model.builder import load_pretrained_model
+from llava.constants import DEFAULT_IMAGE_TOKEN
 from llava.mm_utils import (
     KeywordsStoppingCriteria,
     get_model_name_from_path,
+    tokenizer_image_token,
+    process_images
 )
 from llava.conversation import SeparatorStyle, conv_templates
 from llava.eval.model_vqa import preprocess_qwen