Spaces:
Running
Running
File size: 9,851 Bytes
dd39c08 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
<div align="center">

π οΈ [Setup](#%EF%B8%8F-setup) -
π [Usage](#-usage) -
π» [Demo](#-demo) -
π [Ecosystem](#-ecosystem) -
π [AgentLab](https://github.com/ServiceNow/AgentLab) -
π [Contributors](#-contributors) -
π [Paper](https://arxiv.org/abs/2412.05467) -
π [Citation](#-citing-this-work)
[](https://pypi.org/project/browsergym/)
[]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))
[](https://pypistats.org/packages/browsergym-core)
[](https://star-history.com/#ServiceNow/BrowserGym)
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml)
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml)
```python
pip install browsergym
```
</div>
> [!WARNING]
> BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research.
> It is not meant to be a consumer product. Use with caution!
> [!TIP]
> π Check out [AgentLab](https://github.com/ServiceNow/AgentLab)β¨ !
> A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.
https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85
_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._
BrowserGym includes the following benchmarks by default:
- [MiniWoB](https://miniwob.farama.org/)
- [WebArena](https://webarena.dev/)
- [VisualWebArena](https://jykoh.com/vwa)
- [WorkArena](https://github.com/ServiceNow/WorkArena)
- [AssistantBench](https://github.com/oriyor/assistantbench)
- [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark)
Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class.
## π οΈ Setup
To use browsergym, install one of the following packages:
```sh
pip install browsergym # (recommended) everything below
pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below
pip install browsergym-core # core functionalities only (no benchmark, just the openended task)
pip install browsergym-miniwob # core + miniwob
pip install browsergym-webarena # core + webarena
pip install browsergym-visualwebarena # core + visualwebarena
pip install browsergym-workarena # core + workarena
pip install browsergym-assistantbench # core + assistantbench
pip install weblinx-browsergym # core + weblinx
```
Then setup playwright by running
```sh
playwright install chromium
```
Finally, each benchmark comes with its own specific setup that requires to follow additional steps.
- for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md)
- for WebArena, see [webarena/README.md](browsergym/webarena/README.md)
- for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md)
- for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena)
- for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md)
### ποΈ Development setup
To install browsergym locally for development, use the following commands:
```sh
git clone [email protected]:ServiceNow/BrowserGym.git
cd BrowserGym
make install
```
Contributions are welcome! π
## π Usage
Boilerplate code to run an agent on an interactive, open-ended task:
```python
import gymnasium as gym
import browsergym.core # register the openended task as a gym environment
# start an openended environment
env = gym.make(
"browsergym/openended",
task_kwargs={"start_url": "https://www.google.com/"}, # starting URL
wait_for_user_message=True, # wait for a user message after each agent message sent to the chat
)
# run the environment <> agent loop until termination
obs, info = env.reset()
while True:
action = ... # implement your agent here
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
# release the environment
env.close()
```
MiniWoB
```python
import gymnasium as gym
import browsergym.miniwob # register miniwob tasks as gym environments
# start a miniwob task
env = gym.make("browsergym/miniwob.choose-list")
...
# list all the available miniwob tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))
```
WorkArena
```python
import gymnasium as gym
import browsergym.workarena # register workarena tasks as gym environments
# start a workarena task
env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
...
# list all the available workarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))
```
WebArena
```python
import gymnasium as gym
import browsergym.webarena # register webarena tasks as gym environments
# start a webarena task
env = gym.make("browsergym/webarena.310")
...
# list all the available webarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))
```
VisualWebArena
```python
import gymnasium as gym
import browsergym.webarena # register webarena tasks as gym environments
# start a visualwebarena task
env = gym.make("browsergym/visualwebarena.721")
...
# list all the available visualwebarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
print("\n".join(env_ids))
```
AssistantBench
```python
import gymnasium as gym
import browsergym.workarena # register assistantbench tasks as gym environments
# start an assistantbench task
env = gym.make("browsergym/assistantbench.validation.3")
...
# list all the available assistantbench tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))
```
## π» Demo
If you want to experiment with a demo agent in BrowserGym, follow these steps
```sh
# conda setup
conda env create -f demo_agent/environment.yml
conda activate demo_agent
# or pip setup
pip install -r demo_agent/requirements.txt
# then download the browser for playwright
playwright install chromium
```
Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`.
Launch the demo agent as follows
```sh
# openended (interactive chat mode)
python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com
# miniwob
python demo_agent/run_demo.py --task_name miniwob.click-test
# workarena
python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop
# webarena
python demo_agent/run_demo.py --task_name webarena.4
# visualwebarena
python demo_agent/run_demo.py --task_name visualwebarena.398
```
You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more!
```python
python demo_agent/run_demo.py --help
```
## π Ecosystem
- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces.
- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform.
- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains.
- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains.
- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages.
- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces.
- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web.
- [DoomArena](https://github.com/ServiceNow/DoomArena): A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments.
## π Contributors
[](https://github.com/ServiceNow/BrowserGym/graphs/contributors)
## π Citing This Work
Please use the following BibTeX to cite our work:
```tex
@inproceedings{workarena2024,
title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {11642--11662},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
url = {https://proceedings.mlr.press/v235/drouin24a.html},
}
```
|