Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,61 @@ base_model:
|
|
10 |
- Qwen/Qwen2-VL-7B
|
11 |
---
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
# Qwen2-VL-7B-Instruct
|
14 |
|
15 |
## Introduction
|
|
|
10 |
- Qwen/Qwen2-VL-7B
|
11 |
---
|
12 |
|
13 |
+
# UGround-V1-7B (Qwen2-VL-Based)
|
14 |
+
|
15 |
+
UGround is a storng GUI visual grounding model trained with a simple recipe. Check our homepage and paper for more details.
|
16 |
+

|
17 |
+
- **Homepage:** https://osu-nlp-group.github.io/UGround/
|
18 |
+
- **Repository:** https://github.com/OSU-NLP-Group/UGround
|
19 |
+
- **Paper:** https://arxiv.org/abs/2410.05243
|
20 |
+
- **Demo:** https://huggingface.co/spaces/orby-osu/UGround
|
21 |
+
- **Point of Contact:** [Boyu Gou](mailto:[email protected])
|
22 |
+
|
23 |
+
|
24 |
+
- [x] Model Weights
|
25 |
+
- [ ] Code
|
26 |
+
- [ ] Inference Code of UGround
|
27 |
+
- [x] Offline Experiments
|
28 |
+
- [x] Screenspot (along with referring expressions generated by GPT-4/4o)
|
29 |
+
- [x] Multimodal-Mind2Web
|
30 |
+
- [x] OmniAct
|
31 |
+
- [ ] Online Experiments
|
32 |
+
- [ ] Mind2Web-Live
|
33 |
+
- [ ] AndroidWorld
|
34 |
+
- [ ] Data
|
35 |
+
- [ ] Data Examples
|
36 |
+
- [ ] Data Construction Scripts
|
37 |
+
- [ ] Guidance of Open-source Data
|
38 |
+
- [x] Online Demo (HF Spaces)
|
39 |
+
|
40 |
+
|
41 |
+

|
42 |
+
|
43 |
+
## Citation Information
|
44 |
+
|
45 |
+
If you find this work useful, please consider citing our papers:
|
46 |
+
|
47 |
+
```
|
48 |
+
@article{gou2024uground,
|
49 |
+
title={Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents},
|
50 |
+
author={Boyu Gou and Ruohan Wang and Boyuan Zheng and Yanan Xie and Cheng Chang and Yiheng Shu and Huan Sun and Yu Su},
|
51 |
+
journal={arXiv preprint arXiv:2410.05243},
|
52 |
+
year={2024},
|
53 |
+
url={https://arxiv.org/abs/2410.05243},
|
54 |
+
}
|
55 |
+
|
56 |
+
@article{zheng2023seeact,
|
57 |
+
title={GPT-4V(ision) is a Generalist Web Agent, if Grounded},
|
58 |
+
author={Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su},
|
59 |
+
journal={arXiv preprint arXiv:2401.01614},
|
60 |
+
year={2024},
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
+
|
65 |
+
|
66 |
+
|
67 |
+
|
68 |
# Qwen2-VL-7B-Instruct
|
69 |
|
70 |
## Introduction
|