Update README.md
Browse files
README.md
CHANGED
@@ -120,6 +120,7 @@ foundation for next-generation language model agents to reason and tackle real-w
|
|
120 |
| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
|
121 |
|
122 |
\* conducted on the text-only HLE subset.
|
|
|
123 |
Our models are evaluated with temperature=1.0, top_p=0.95.
|
124 |
|
125 |
### SWE-bench methodology
|
|
|
120 |
| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
|
121 |
|
122 |
\* conducted on the text-only HLE subset.
|
123 |
+
|
124 |
Our models are evaluated with temperature=1.0, top_p=0.95.
|
125 |
|
126 |
### SWE-bench methodology
|