Update README.md
Browse files
README.md
CHANGED
@@ -118,7 +118,7 @@ base_model:
|
|
118 |
---
|
119 |
|
120 |
|
121 |
-
# LLaVA-
|
122 |
|
123 |
## Table of Contents
|
124 |
|
@@ -131,7 +131,7 @@ base_model:
|
|
131 |
|
132 |
## Model Summary
|
133 |
|
134 |
-
The LLaVA-
|
135 |
|
136 |
This model support at most 64 frames.
|
137 |
|
@@ -187,7 +187,7 @@ def load_video(self, video_path, max_frames_num,fps=1,force_sample=False):
|
|
187 |
spare_frames = vr.get_batch(frame_idx).asnumpy()
|
188 |
# import pdb;pdb.set_trace()
|
189 |
return spare_frames,frame_time,video_time
|
190 |
-
pretrained = "lmms-lab/LLaVA-
|
191 |
model_name = "llava_qwen"
|
192 |
device = "cuda"
|
193 |
device_map = "auto"
|
|
|
118 |
---
|
119 |
|
120 |
|
121 |
+
# LLaVA-Video-72B-Qwen2
|
122 |
|
123 |
## Table of Contents
|
124 |
|
|
|
131 |
|
132 |
## Model Summary
|
133 |
|
134 |
+
The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
|
135 |
|
136 |
This model support at most 64 frames.
|
137 |
|
|
|
187 |
spare_frames = vr.get_batch(frame_idx).asnumpy()
|
188 |
# import pdb;pdb.set_trace()
|
189 |
return spare_frames,frame_time,video_time
|
190 |
+
pretrained = "lmms-lab/LLaVA-Video-72B-Qwen2"
|
191 |
model_name = "llava_qwen"
|
192 |
device = "cuda"
|
193 |
device_map = "auto"
|