Daemontatox commited on
Commit
863ddb5
·
verified ·
1 Parent(s): c5e5a24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -82
README.md CHANGED
@@ -8,10 +8,7 @@ pipeline_tag: image-text-to-text
8
  library_name: transformers
9
  ---
10
 
11
- ## R1-Onevision
12
 
13
- [\[📂 GitHub\]](https://github.com/Fancy-MLLM/R1-Onevision)[\[📝 Report\]](https://yangyi-vai.notion.site/r1-onevision?pvs=4)
14
- [\[🤗 HF Dataset\]](https://huggingface.co/datasets/Fancy-MLLM/R1-onevision) [\[🤗 Reasoning Benchmark\]](https://huggingface.co/datasets/Fancy-MLLM/R1-OneVision-Bench) [\[🤗 HF Demo\]](https://huggingface.co/spaces/Fancy-MLLM/R1-OneVision)
15
 
16
  ## Model Overview
17
 
@@ -36,82 +33,3 @@ bf16: true
36
  flash_attn: fa2
37
  ```
38
 
39
- Training loss curve:
40
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/8BNyo-v68aFvab2kXxtt1.png"/>
41
-
42
- ## Usage
43
-
44
- You can load the model using the Hugging Face `transformers` library:
45
-
46
- ```python
47
- from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
48
- import torch
49
- from qwen_vl_utils import process_vision_info
50
-
51
- MODEL_ID = "Fancy-MLLM/R1-Onevision-7B"
52
- processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
53
- model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
54
- MODEL_ID,
55
- trust_remote_code=True,
56
- torch_dtype=torch.bfloat16
57
- ).to("cuda").eval()
58
-
59
- messages = [
60
- {
61
- "role": "user",
62
- "content": [
63
- {"type": "image", "image": "<your image path>"},
64
- {"type": "text", "text": "Hint: Please answer the question and provide the final answer at the end. Question: Which number do you have to write in the last daisy?"},
65
- ],
66
- }
67
- ]
68
-
69
- # Preparation for inference
70
- text = processor.apply_chat_template(
71
- messages, tokenize=False, add_generation_prompt=True
72
- )
73
- image_inputs, video_inputs = process_vision_info(messages)
74
- inputs = processor(
75
- text=[text],
76
- images=image_inputs,
77
- videos=video_inputs,
78
- padding=True,
79
- return_tensors="pt",
80
- )
81
- inputs = inputs.to(model.device)
82
-
83
- generated_ids = model.generate(**inputs, max_new_tokens=4096)
84
- generated_ids_trimmed = [
85
- out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
86
- ]
87
- output_text = processor.batch_decode(
88
- generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
89
- )
90
- print(output_text)
91
- ```
92
-
93
- ## Ongoing Work
94
- 1. **Rule-Based Reinforcement Learning (RL)**
95
-
96
- We are actively exploring the integration of rule-based systems into reinforcement learning to enhance the agent's decision-making process. This approach combines domain-specific rules with the learning process, aiming to improve the efficiency and safety of learning in complex environments.
97
-
98
- 2. **Training with General Data and Multimodal Reasoning CoT**
99
-
100
- Our ongoing work includes expanding the training datasets by incorporating more general data alongside multimodal reasoning Chain-of-Thought (CoT) data. This will enable the model to benefit from a broader range of information, enhancing its ability to handle diverse reasoning tasks across various domains.
101
-
102
- 3. **Incorporating Chinese Multimodal Reasoning CoT Data**
103
-
104
- We are also focused on integrating Chinese multimodal reasoning CoT data into the training process. By adding this language-specific dataset, we aim to improve the model’s capability to perform reasoning tasks in Chinese, expanding its multilingual and multimodal reasoning proficiency.
105
-
106
- 4. **Release of the 3B Model**
107
-
108
-
109
- We are working on the release of a smaller, more efficient 3B model, which is designed to provide a balance between performance and resource efficiency. This model aims to deliver strong multimodal reasoning capabilities while being more accessible and optimized for environments with limited computational resources, offering a more compact alternative to the current 7B model.
110
-
111
- # Institution
112
- - Zhejiang University
113
-
114
- ## Model Contact
115
116
117
 
8
  library_name: transformers
9
  ---
10
 
 
11
 
 
 
12
 
13
  ## Model Overview
14
 
 
33
  flash_attn: fa2
34
  ```
35