prithivMLmods commited on
Commit
4cad0af
·
verified ·
1 Parent(s): 8412459

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md CHANGED
@@ -17,6 +17,53 @@ tags:
17
 
18
  # **Quickstart with Transformers**
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # **Key Enhancements of QvQ-Step-Tiny**
21
 
22
  1. **State-of-the-Art Visual Understanding**
 
17
 
18
  # **Quickstart with Transformers**
19
 
20
+ Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
21
+
22
+ ```python
23
+ from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
24
+ from qwen_vl_utils import process_vision_info
25
+
26
+ # default: Load the model on the available device(s)
27
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
28
+ "prithivMLmods/QvQ-Step-Tiny", torch_dtype="auto", device_map="auto"
29
+ )
30
+
31
+ messages = [
32
+ {
33
+ "role": "user",
34
+ "content": [
35
+ {
36
+ "type": "image",
37
+ "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
38
+ },
39
+ {"type": "text", "text": "Describe this image."},
40
+ ],
41
+ }
42
+ ]
43
+
44
+ text = processor.apply_chat_template(
45
+ messages, tokenize=False, add_generation_prompt=True
46
+ )
47
+ image_inputs, video_inputs = process_vision_info(messages)
48
+ inputs = processor(
49
+ text=[text],
50
+ images=image_inputs,
51
+ videos=video_inputs,
52
+ padding=True,
53
+ return_tensors="pt",
54
+ )
55
+ inputs = inputs.to("cuda")
56
+
57
+ # Inference: Generation of the output
58
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
59
+ generated_ids_trimmed = [
60
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
61
+ ]
62
+ output_text = processor.batch_decode(
63
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
64
+ )
65
+ print(output_text)
66
+ ```
67
  # **Key Enhancements of QvQ-Step-Tiny**
68
 
69
  1. **State-of-the-Art Visual Understanding**