Commit
·
a17c3c8
1
Parent(s):
e3f1cee
Update README with eval details
Browse files
results/OrbyAgent-ActIO-72b/README.md
CHANGED
@@ -2,4 +2,6 @@
|
|
2 |
|
3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
4 |
|
|
|
|
|
5 |
It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
|
|
2 |
|
3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
4 |
|
5 |
+
The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
|
6 |
+
|
7 |
It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
results/OrbyAgent-Claude-3.5-Sonnet/README.md
CHANGED
@@ -2,4 +2,6 @@
|
|
2 |
|
3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
4 |
|
|
|
|
|
5 |
It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
|
|
2 |
|
3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
4 |
|
5 |
+
The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
|
6 |
+
|
7 |
It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|