Add library name, pipeline tag, and correct links

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -1,28 +1,30 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - OS-Copilot/OS-Atlas-data
5
  base_model:
6
  - ByteDance-Seed/UI-TARS-2B-SFT
 
 
 
 
 
7
  ---
8
 
9
  # GUI-Actor-Verifier-2B
10
 
11
 
12
- This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://aka.ms/GUI-Actor).
13
  It is developed based on [UI-TARS-2B-SFT](https://huggingface.co/ByteDance-Seed/UI-TARS-2B-SFT) and is designed to predict the correctness of an action position given a language instruction. This model is well-suited for **GUI-Actor**, as its attention map effectively provides diverse candidates for verification with only a single inference.
14
 
15
 
16
- For more details on model design and evaluation, please check: [๐Ÿ  Project Page](https://aka.ms/GUI-Actor) | [๐Ÿ’ป Github Repo](https://github.com/microsoft/GUI-Actor) | [๐Ÿ“‘ Paper](https://www.arxiv.org/pdf/2506.03143).
17
 
18
 
19
  | Model List | Hugging Face Link |
20
  |--------------------------------------------|--------------------------------------------|
21
- | **GUI-Actor-7B-Qwen2-VL** | [๐Ÿค— Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL) |
22
- | **GUI-Actor-2B-Qwen2-VL** | [๐Ÿค— Hugging Face](https://huggingface.co/microsoft/GUI-Actor-2B-Qwen2-VL) |
23
- | **GUI-Actor-7B-Qwen2.5-VL (coming soon)** | [๐Ÿค— Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2.5-VL) |
24
- | **GUI-Actor-3B-Qwen2.5-VL (coming soon)** | [๐Ÿค— Hugging Face](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL) |
25
- | **GUI-Actor-Verifier-2B** | [๐Ÿค— Hugging Face](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B) |
26
 
27
 
28
 
@@ -58,8 +60,8 @@ Table 2. Main results on the ScreenSpot-Pro and ScreenSpot-v2 with **Qwen2.5-VL*
58
  | **_3B models:_**
59
  | Qwen2.5-VL-3B | Qwen2.5-VL | 25.9 | 80.9 |
60
  | Jedi-3B | Qwen2.5-VL | 36.1 | 88.6 |
61
- | GUI-Actor-3B | Qwen2.5-VL | 42.2 | 91.0 |
62
- | GUI-Actor-3B + Verifier | Qwen2.5-VL | **45.9** | **92.4** |
63
 
64
  ## ๐Ÿš€ Usage
65
  The verifier takes a language instruction and an image with a red circle marking the target position as input. One example is shown below. It outputs either โ€˜Trueโ€™ or โ€˜Falseโ€™, and you can also use the probability of each label to score the sample.
 
1
  ---
 
 
 
2
  base_model:
3
  - ByteDance-Seed/UI-TARS-2B-SFT
4
+ datasets:
5
+ - OS-Copilot/OS-Atlas-data
6
+ license: mit
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
11
  # GUI-Actor-Verifier-2B
12
 
13
 
14
+ This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
15
  It is developed based on [UI-TARS-2B-SFT](https://huggingface.co/ByteDance-Seed/UI-TARS-2B-SFT) and is designed to predict the correctness of an action position given a language instruction. This model is well-suited for **GUI-Actor**, as its attention map effectively provides diverse candidates for verification with only a single inference.
16
 
17
 
18
+ For more details on model design and evaluation, please check: [Project Page](https://microsoft.github.io/GUI-Actor/) | [Github Repo](https://github.com/microsoft/GUI-Actor) | [Paper](https://huggingface.co/papers/2506.03143).
19
 
20
 
21
  | Model List | Hugging Face Link |
22
  |--------------------------------------------|--------------------------------------------|
23
+ | **GUI-Actor-7B-Qwen2-VL** | [Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL) |
24
+ | **GUI-Actor-2B-Qwen2-VL** | [Hugging Face](https://huggingface.co/microsoft/GUI-Actor-2B-Qwen2-VL) |
25
+ | **GUI-Actor-7B-Qwen2.5-VL (coming soon)** | [Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2.5-VL) |
26
+ | **GUI-Actor-3B-Qwen2.5-VL (coming soon)** | [Hugging Face](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL) |
27
+ | **GUI-Actor-Verifier-2B** | [Hugging Face](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B) |
28
 
29
 
30
 
 
60
  | **_3B models:_**
61
  | Qwen2.5-VL-3B | Qwen2.5-VL | 25.9 | 80.9 |
62
  | Jedi-3B | Qwen2.5-VL | 36.1 | 88.6 |
63
+ | GUI-Actor-3B | Qwen2.5-VL | **42.2** | **91.0** |
64
+ | GUI-Actor-3B + Verifier | Qwen2.5-VL | 45.9 | 92.4 |
65
 
66
  ## ๐Ÿš€ Usage
67
  The verifier takes a language instruction and an image with a red circle marking the target position as input. One example is shown below. It outputs either โ€˜Trueโ€™ or โ€˜Falseโ€™, and you can also use the probability of each label to score the sample.