XanderJC commited on
Commit
719511c
·
1 Parent(s): 6b71ec0
Makefile CHANGED
@@ -1,6 +1,7 @@
1
  .PHONY: proxy
2
 
3
  proxy:
 
4
  uv venv --python 3.11 --python-preference managed
5
  uv sync
6
  uv pip install -e .
 
1
  .PHONY: proxy
2
 
3
  proxy:
4
+ pip install uv
5
  uv venv --python 3.11 --python-preference managed
6
  uv sync
7
  uv pip install -e .
README.md CHANGED
@@ -6,9 +6,8 @@ A mini, open-weights version of our Proxy assistant.
6
 
7
  ---
8
 
9
- ## Getting Started
10
 
11
- ### Installation
12
 
13
  Clone the repository:
14
 
@@ -25,6 +24,7 @@ make proxy
25
  Or do it manually:
26
 
27
  ```bash
 
28
  uv venv --python 3.11 --python-preference managed
29
  uv sync
30
  uv pip install -e .
@@ -32,7 +32,7 @@ playwright install
32
  ```
33
 
34
 
35
- ### Usage
36
 
37
  ```bash
38
  proxy --help
@@ -70,6 +70,57 @@ or by setting the environment variable:
70
  export PROXY_LITE_API_BASE=http://localhost:8008/v1
71
  ```
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
 
75
 
 
6
 
7
  ---
8
 
 
9
 
10
+ ## Installation
11
 
12
  Clone the repository:
13
 
 
24
  Or do it manually:
25
 
26
  ```bash
27
+ pip install uv
28
  uv venv --python 3.11 --python-preference managed
29
  uv sync
30
  uv pip install -e .
 
32
  ```
33
 
34
 
35
+ ## Usage
36
 
37
  ```bash
38
  proxy --help
 
70
  export PROXY_LITE_API_BASE=http://localhost:8008/v1
71
  ```
72
 
73
+ ### Scaffolding Proxy-Lite in Python
74
+
75
+ We use the `RunnerConfig` to control the setup of the task.
76
+ The library is designed to be modular and extendable, you can easily swap the environment, solver, or agent.
77
+
78
+ Example:
79
+ ```python
80
+ import asyncio
81
+ from proxy_lite import Runner, RunnerConfig
82
+
83
+ config = RunnerConfig.from_dict(
84
+ {
85
+ "environment": {
86
+ "name": "webbrowser",
87
+ "homepage": "https://www.google.com",
88
+ "headless": True, # Don't show the browser
89
+ },
90
+ "solver": {
91
+ "name": "simple",
92
+ "agent": {
93
+ "name": "proxy_lite",
94
+ "client": {
95
+ "name": "convergence",
96
+ "model_id": "convergence-ai/proxy-lite",
97
+ "api_base": "https://convergence-ai-demo-api.hf.space/v1",
98
+ },
99
+ },
100
+ },
101
+ "max_steps": 50,
102
+ "action_timeout": 1800,
103
+ "environment_timeout": 1800,
104
+ "task_timeout": 18000,
105
+ "logger_level": "DEBUG",
106
+ },
107
+ )
108
+
109
+ proxy = Runner(config=config)
110
+ result = asyncio.run(
111
+ proxy.run("Book a table for 2 at an Italian restaurant in Kings Cross tonight at 7pm.")
112
+ )
113
+ ```
114
+
115
+ ### Webbrowser Environment
116
+
117
+ The `webbrowser` environment is a simple environment that uses the `playwright` library to navigate the web.
118
+
119
+ We launch a Chromium browser and navigate to the `homepage` provided in the `RunnerConfig`.
120
+
121
+ Actions in an environment are defined through available tool calls, which in the browser case are set as default in the `BrowserTool` class. This allows the model to click, type, etc. at relevant `mark_id` elements on the page. These elements are extracted using JavaScript injected into the page in order to make interaction easier for the models.
122
+
123
+ If you want to not use this set-of-marks approach, you can set the `no_pois_in_image` flag to `True`, and the `include_poi_text` flag to `False` in the `EnvironmentConfig`. This way the model will only see the original image, and not the annotated image with these points-of-interest (POIs). In this case, you would want to update the `BrowserTool` to interact with pixel coordinates instead of the `mark_id`s.
124
 
125
 
126
 
src/proxy_lite/environments/webbrowser.py CHANGED
@@ -28,6 +28,7 @@ class WebBrowserEnvironmentConfig(BaseEnvironmentConfig):
28
  browserbase_timeout: int = 7200
29
  headless: bool = True
30
  keep_original_image: bool = False
 
31
 
32
 
33
  @Environments.register_environment("webbrowser")
@@ -78,8 +79,10 @@ class WebBrowserEnvironment(BaseEnvironment):
78
  original_img, annotated_img = await self.browser.screenshot(
79
  delay=self.config.screenshot_delay,
80
  )
81
-
82
- base64_image = base64.b64encode(annotated_img).decode("utf-8")
 
 
83
 
84
  html_content = await self.browser.current_page.content() if self.config.include_html else None
85
 
 
28
  browserbase_timeout: int = 7200
29
  headless: bool = True
30
  keep_original_image: bool = False
31
+ no_pois_in_image: bool = False
32
 
33
 
34
  @Environments.register_environment("webbrowser")
 
79
  original_img, annotated_img = await self.browser.screenshot(
80
  delay=self.config.screenshot_delay,
81
  )
82
+ if self.config.no_pois_in_image:
83
+ base64_image = base64.b64encode(original_img).decode("utf-8")
84
+ else:
85
+ base64_image = base64.b64encode(annotated_img).decode("utf-8")
86
 
87
  html_content = await self.browser.current_page.content() if self.config.include_html else None
88