Inst-IT
/

LLaVA-Next-Inst-It-Qwen2-7B

Video-Text-to-Text

instance-understanding

Model card Files Files and versions Community

wjpoom commited on Feb 17

Commit

f642936

·

verified ·

1 Parent(s): 40e0faa

Update README.md

Files changed (1) hide show

README.md +35 -1

README.md CHANGED Viewed

@@ -183,7 +183,41 @@ RuntimeError: Error(s) in loading state_dict for CLIPVisionModel:
 	size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152]) from checkpoint, the shape in current model is torch.Size([730, 1152]).
 	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
 ```
-If you meet this error, you can fix this error following the guidelines in [this issue](https://github.com/inst-it/inst-it/issues/3).
 **Load Model**
 ```python

 	size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152]) from checkpoint, the shape in current model is torch.Size([730, 1152]).
 	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
 ```
+If you meet this error, you can fix this error following the guidelines as below:
+<details>
+<summary>Error handling guideline</summary>
+ This is a logical error encountered when loading the vision tower from the local path. To fix this issue, you can prepare the environment in any of the following ways.
+**Option 1: Install from our fork of LLaVA-NeXT:**
+```shell
+pip install git+https://github.com/inst-it/LLaVA-NeXT.git
+```
+**Option 2: Install from LLaVA-NeXT and manually modify its code:**
+* step 1: clone source code
+```shell
+git clone https://github.com/LLaVA-VL/LLaVA-NeXT.git
+```
+* step 2: before installing LLaVA-NeXT, you need to modify `line 17` of [llava/model/multimodal_encoder/builder.py](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/llava/model/multimodal_encoder/builder.py#L17).
+```python
+# Before modification:
+if is_absolute_path_exists or vision_tower.startswith("openai") or vision_tower.startswith("laion") or "ShareGPT4V" in vision_tower:
+# After modification:
+if "clip" in vision_tower or vision_tower.startswith("openai") or vision_tower.startswith("laion") or "ShareGPT4V" in vision_tower:
+```
+* step 3: install LLaVA-NeXT from source:
+```shell
+cd LLaVA-NeXT
+pip install --upgrade pip  # Enable PEP 660 support.
+pip install -e ".[train]"
+```
+We recommend the first way because it is simple.
+</details>
 **Load Model**
 ```python