ligang-orby commited on
Commit
a17c3c8
·
1 Parent(s): e3f1cee

Update README with eval details

Browse files
results/OrbyAgent-ActIO-72b/README.md CHANGED
@@ -2,4 +2,6 @@
2
 
3
  This agent is developed by [Orby AI](https://www.orby.ai/).
4
 
 
 
5
  It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
 
2
 
3
  This agent is developed by [Orby AI](https://www.orby.ai/).
4
 
5
+ The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
6
+
7
  It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
results/OrbyAgent-Claude-3.5-Sonnet/README.md CHANGED
@@ -2,4 +2,6 @@
2
 
3
  This agent is developed by [Orby AI](https://www.orby.ai/).
4
 
 
 
5
  It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
 
2
 
3
  This agent is developed by [Orby AI](https://www.orby.ai/).
4
 
5
+ The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
6
+
7
  It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().