BoyuNLP commited on
Commit
14c14d5
·
verified ·
1 Parent(s): b5f1330

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -10,6 +10,61 @@ base_model:
10
  - Qwen/Qwen2-VL-7B
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Qwen2-VL-7B-Instruct
14
 
15
  ## Introduction
 
10
  - Qwen/Qwen2-VL-7B
11
  ---
12
 
13
+ # UGround-V1-7B (Qwen2-VL-Based)
14
+
15
+ UGround is a storng GUI visual grounding model trained with a simple recipe. Check our homepage and paper for more details.
16
+ ![radar](https://osu-nlp-group.github.io/UGround/static/images/radar.png)
17
+ - **Homepage:** https://osu-nlp-group.github.io/UGround/
18
+ - **Repository:** https://github.com/OSU-NLP-Group/UGround
19
+ - **Paper:** https://arxiv.org/abs/2410.05243
20
+ - **Demo:** https://huggingface.co/spaces/orby-osu/UGround
21
+ - **Point of Contact:** [Boyu Gou](mailto:[email protected])
22
+
23
+
24
+ - [x] Model Weights
25
+ - [ ] Code
26
+ - [ ] Inference Code of UGround
27
+ - [x] Offline Experiments
28
+ - [x] Screenspot (along with referring expressions generated by GPT-4/4o)
29
+ - [x] Multimodal-Mind2Web
30
+ - [x] OmniAct
31
+ - [ ] Online Experiments
32
+ - [ ] Mind2Web-Live
33
+ - [ ] AndroidWorld
34
+ - [ ] Data
35
+ - [ ] Data Examples
36
+ - [ ] Data Construction Scripts
37
+ - [ ] Guidance of Open-source Data
38
+ - [x] Online Demo (HF Spaces)
39
+
40
+
41
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6500870f1e14749e84f8f887/u5bXFxxAWCXthyXWyZkM4.png)
42
+
43
+ ## Citation Information
44
+
45
+ If you find this work useful, please consider citing our papers:
46
+
47
+ ```
48
+ @article{gou2024uground,
49
+ title={Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents},
50
+ author={Boyu Gou and Ruohan Wang and Boyuan Zheng and Yanan Xie and Cheng Chang and Yiheng Shu and Huan Sun and Yu Su},
51
+ journal={arXiv preprint arXiv:2410.05243},
52
+ year={2024},
53
+ url={https://arxiv.org/abs/2410.05243},
54
+ }
55
+
56
+ @article{zheng2023seeact,
57
+ title={GPT-4V(ision) is a Generalist Web Agent, if Grounded},
58
+ author={Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su},
59
+ journal={arXiv preprint arXiv:2401.01614},
60
+ year={2024},
61
+ }
62
+ ```
63
+
64
+
65
+
66
+
67
+
68
  # Qwen2-VL-7B-Instruct
69
 
70
  ## Introduction