XanderJC commited on
Commit
514e5bf
·
1 Parent(s): 11250db
Files changed (2) hide show
  1. .gitignore +3 -1
  2. README.md +97 -0
.gitignore CHANGED
@@ -172,4 +172,6 @@ cython_debug/
172
 
173
  logs/
174
  local_trajectories/
175
- screenshots/
 
 
 
172
 
173
  logs/
174
  local_trajectories/
175
+ screenshots/
176
+ gifs/
177
+ .DS_Store
README.md CHANGED
@@ -154,6 +154,80 @@ result = asyncio.run(
154
  )
155
  ```
156
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  ### Webbrowser Environment
158
 
159
  The `webbrowser` environment is a simple environment that uses the `playwright` library to navigate the web.
@@ -167,3 +241,26 @@ If you want to not use this set-of-marks approach, you can set the `no_pois_in_i
167
  **Note:** We use `playwright_stealth` to lower the chance of detection by anti-bot services, but this isn't foolproof and Proxy Lite may still get blocked with captchas or other anti-bot measures, especially when using the `headless` flag. We recommend using network proxies to avoid this issue.
168
 
169
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  )
155
  ```
156
 
157
+ The `Runner` sets the solver and environment off in a loop, like in a traditional reinforcement learning setup.
158
+
159
+ <div align="center">
160
+ <img src="assets/loop.png" alt="Runner Loop" width="400" height="auto" style="margin-bottom: 20px;" />
161
+ </div>
162
+
163
+
164
+ When it comes to prompting Proxy Lite, the model expects a message history of the form:
165
+
166
+ ```python
167
+ message_history = [
168
+ {
169
+ "role": "system",
170
+ "content": "You are Proxy Lite...", # Full system prompt in src/proxy_lite/agents/proxy_lite_agent.py
171
+ }, # System prompt
172
+ {
173
+ "role": "user",
174
+ "content": "Book a table for 2 at an Italian restaurant in Kings Cross tonight at 7pm.",
175
+ }, # Set the task
176
+ {
177
+ "role": "user",
178
+ "content": [
179
+ {"type": "image_url", "image_url": {base64_encoded_screenshot} },
180
+ {"type": "text", "text": "URL: https://www.google.com/ \n- [0] <a>About</a> \n- [1] <a>Store</a>...."}
181
+ ] # This is the observation from the environment
182
+ },
183
+ ]
184
+ ```
185
+ This would then build up the message history, alternating between the assistant (action) and the user (observation), although for new calls, all the last observations other than the current one are discarded.
186
+
187
+ The chat template will format this automatically, but also expects the appropriate `Tools` to be passed in so that the model is aware of the available actions. You can do this with `transformers`:
188
+
189
+ ```python
190
+ from qwen_vl_utils import process_vision_info
191
+ from transformers import AutoProcessor
192
+
193
+ from proxy_lite.tools import ReturnValueTool, BrowserTool
194
+ from proxy_lite.serializer import OpenAICompatableSerializer
195
+
196
+ processor = AutoProcessor.from_pretrained("convergence-ai/proxy-lite")
197
+ tools = OpenAICompatableSerializer().serialize_tools([ReturnValueTool(), BrowserTool(session=None)])
198
+
199
+ templated_messages = processor.apply_chat_template(
200
+ message_history, tokenize=False, add_generation_prompt=True, tools=tools
201
+ )
202
+
203
+ image_inputs, video_inputs = process_vision_info(message_history)
204
+
205
+ batch = processor(
206
+ text=[templated_messages],
207
+ images=image_inputs,
208
+ videos=video_inputs,
209
+ padding=True,
210
+ return_tensors="pt",
211
+ )
212
+ ```
213
+
214
+ Or you can send to the endpoint directly, which will handle the formatting:
215
+
216
+ ```python
217
+ from openai import OpenAI
218
+
219
+ client = OpenAI(api_base="http://convergence-ai-demo-api.hf.space/v1")
220
+
221
+ response = client.chat.completions.create(
222
+ model="convergence-ai/proxy-lite",
223
+ messages=message_history,
224
+ tools=tools,
225
+ tool_choice="auto",
226
+ )
227
+ ```
228
+
229
+
230
+
231
  ### Webbrowser Environment
232
 
233
  The `webbrowser` environment is a simple environment that uses the `playwright` library to navigate the web.
 
241
  **Note:** We use `playwright_stealth` to lower the chance of detection by anti-bot services, but this isn't foolproof and Proxy Lite may still get blocked with captchas or other anti-bot measures, especially when using the `headless` flag. We recommend using network proxies to avoid this issue.
242
 
243
 
244
+ ## Limitations
245
+
246
+ This model has not currently been designed to act as a full assistant that can interact with the user, and is instead designed to as a tool that will go out and *autonomously* complete the task set.
247
+ As such, it will struggle with tasks that require credentials or user interaction such as actually purchasing items if you don't give all the required details in the prompt.
248
+
249
+
250
+ ## Future Work
251
+
252
+ - [ ] Pixel level control over the mouse movements.
253
+ - [ ] Full computer sandbox.
254
+ - [ ] Multi agent support.
255
+
256
+ ## Citation
257
+
258
+
259
+ ```bibtex
260
+ @article{proxy-lite,
261
+ title={Proxy Lite - A Mini, Open-weights, Autonomous Assistant},
262
+ author={Convergence AI},
263
+ url={https://github.com/convergence-ai/proxy-lite},
264
+ year={2025}
265
+ }
266
+ ```