Text Generation
Transformers
Safetensors
English
llava
multimodal
conversational
Eval Results
ZhangYuanhan commited on
Commit
9d995d7
·
verified ·
1 Parent(s): b011a54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -118,7 +118,7 @@ base_model:
118
  ---
119
 
120
 
121
- # LLaVA-NeXT-Video-72B-Qwen2
122
 
123
  ## Table of Contents
124
 
@@ -131,7 +131,7 @@ base_model:
131
 
132
  ## Model Summary
133
 
134
- The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
135
 
136
  This model support at most 64 frames.
137
 
@@ -187,7 +187,7 @@ def load_video(self, video_path, max_frames_num,fps=1,force_sample=False):
187
  spare_frames = vr.get_batch(frame_idx).asnumpy()
188
  # import pdb;pdb.set_trace()
189
  return spare_frames,frame_time,video_time
190
- pretrained = "lmms-lab/LLaVA-NeXT-Video-72B-Qwen2"
191
  model_name = "llava_qwen"
192
  device = "cuda"
193
  device_map = "auto"
 
118
  ---
119
 
120
 
121
+ # LLaVA-Video-72B-Qwen2
122
 
123
  ## Table of Contents
124
 
 
131
 
132
  ## Model Summary
133
 
134
+ The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
135
 
136
  This model support at most 64 frames.
137
 
 
187
  spare_frames = vr.get_batch(frame_idx).asnumpy()
188
  # import pdb;pdb.set_trace()
189
  return spare_frames,frame_time,video_time
190
+ pretrained = "lmms-lab/LLaVA-Video-72B-Qwen2"
191
  model_name = "llava_qwen"
192
  device = "cuda"
193
  device_map = "auto"